crawler.py — langchain Source File
Architecture documentation for crawler.py, a python file in the langchain codebase. 5 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR 3917f38f_3078_cc58_74a8_71a235ab29ed["crawler.py"] 2a7f66a7_8738_3d47_375b_70fcaa6ac169["logging"] 3917f38f_3078_cc58_74a8_71a235ab29ed --> 2a7f66a7_8738_3d47_375b_70fcaa6ac169 0c1d9a1b_c553_0388_dbc1_58af49567aa2["time"] 3917f38f_3078_cc58_74a8_71a235ab29ed --> 0c1d9a1b_c553_0388_dbc1_58af49567aa2 d76a28c2_c3ab_00a8_5208_77807a49449d["sys"] 3917f38f_3078_cc58_74a8_71a235ab29ed --> d76a28c2_c3ab_00a8_5208_77807a49449d 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] 3917f38f_3078_cc58_74a8_71a235ab29ed --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 958c58cc_a30e_96f7_1ed2_ce3683f10d86["playwright.sync_api"] 3917f38f_3078_cc58_74a8_71a235ab29ed --> 958c58cc_a30e_96f7_1ed2_ce3683f10d86 style 3917f38f_3078_cc58_74a8_71a235ab29ed fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
import logging
import time
from sys import platform
from typing import (
TYPE_CHECKING,
Any,
TypedDict,
)
if TYPE_CHECKING:
from playwright.sync_api import Browser, CDPSession, Page
logger = logging.getLogger(__name__)
black_listed_elements: set[str] = {
"html",
"head",
"title",
"meta",
"iframe",
"body",
"script",
"style",
"path",
"svg",
"br",
"::marker",
}
class ElementInViewPort(TypedDict):
"""A typed dictionary containing information about elements in the viewport."""
node_index: str
backend_node_id: int
node_name: str | None
node_value: str | None
node_meta: list[str]
is_clickable: bool
origin_x: int
origin_y: int
center_x: int
center_y: int
class Crawler:
"""A crawler for web pages.
**Security Note**: This is an implementation of a crawler that uses a browser via
Playwright.
This crawler can be used to load arbitrary webpages INCLUDING content
from the local file system.
Control access to who can submit crawling requests and what network access
the crawler has.
Make sure to scope permissions to the minimal permissions necessary for
the application.
// ... (420 more lines)
Domain
Subdomains
Functions
Classes
Dependencies
- logging
- playwright.sync_api
- sys
- time
- typing
Source
Frequently Asked Questions
What does crawler.py do?
crawler.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, RunnableInterface subdomain.
What functions are defined in crawler.py?
crawler.py defines 1 function(s): playwright.
What does crawler.py depend on?
crawler.py imports 5 module(s): logging, playwright.sync_api, sys, time, typing.
Where is crawler.py in the architecture?
crawler.py is located at libs/langchain/langchain_classic/chains/natbot/crawler.py (domain: CoreAbstractions, subdomain: RunnableInterface, directory: libs/langchain/langchain_classic/chains/natbot).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free