crawl() — langchain Function Reference
Architecture documentation for the crawl() function in crawler.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD c0e7550e_b556_2c60_2efa_05b08fe640eb["crawl()"] 73034b47_6ada_6cee_4b85_e74b6a3e14f1["Crawler"] c0e7550e_b556_2c60_2efa_05b08fe640eb -->|defined in| 73034b47_6ada_6cee_4b85_e74b6a3e14f1 style c0e7550e_b556_2c60_2efa_05b08fe640eb fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/langchain/langchain_classic/chains/natbot/crawler.py lines 150–479
def crawl(self) -> list[str]:
"""Crawl the current page.
Returns:
A list of the elements in the viewport.
"""
page = self.page
page_element_buffer = self.page_element_buffer
start = time.time()
page_state_as_text = []
device_pixel_ratio: float = page.evaluate("window.devicePixelRatio")
if platform == "darwin" and device_pixel_ratio == 1: # lies
device_pixel_ratio = 2
win_upper_bound: float = page.evaluate("window.pageYOffset")
win_left_bound: float = page.evaluate("window.pageXOffset")
win_width: float = page.evaluate("window.screen.width")
win_height: float = page.evaluate("window.screen.height")
win_right_bound: float = win_left_bound + win_width
win_lower_bound: float = win_upper_bound + win_height
# percentage_progress_start = (win_upper_bound / document_scroll_height) * 100
# percentage_progress_end = (
# (win_height + win_upper_bound) / document_scroll_height
# ) * 100
percentage_progress_start = 1
percentage_progress_end = 2
page_state_as_text.append(
{
"x": 0,
"y": 0,
"text": f"[scrollbar {percentage_progress_start:0.2f}-"
f"{percentage_progress_end:0.2f}%]",
}
)
tree = self.client.send(
"DOMSnapshot.captureSnapshot",
{"computedStyles": [], "includeDOMRects": True, "includePaintOrder": True},
)
strings: dict[int, str] = tree["strings"]
document: dict[str, Any] = tree["documents"][0]
nodes: dict[str, Any] = document["nodes"]
backend_node_id: dict[int, int] = nodes["backendNodeId"]
attributes: dict[int, dict[int, Any]] = nodes["attributes"]
node_value: dict[int, int] = nodes["nodeValue"]
parent: dict[int, int] = nodes["parentIndex"]
node_names: dict[int, int] = nodes["nodeName"]
is_clickable: set[int] = set(nodes["isClickable"]["index"])
input_value: dict[str, Any] = nodes["inputValue"]
input_value_index: list[int] = input_value["index"]
input_value_values: list[int] = input_value["value"]
layout: dict[str, Any] = document["layout"]
layout_node_index: list[int] = layout["nodeIndex"]
bounds: dict[int, list[float]] = layout["bounds"]
cursor: int = 0
child_nodes: dict[str, list[dict[str, Any]]] = {}
elements_in_view_port: list[ElementInViewPort] = []
anchor_ancestry: dict[str, tuple[bool, int | None]] = {"-1": (False, None)}
button_ancestry: dict[str, tuple[bool, int | None]] = {"-1": (False, None)}
def convert_name(
node_name: str | None,
has_click_handler: bool | None, # noqa: FBT001
) -> str:
if node_name == "a":
return "link"
if node_name == "input":
return "input"
if node_name == "img":
return "img"
if (
node_name == "button" or has_click_handler
Domain
Subdomains
Source
Frequently Asked Questions
What does crawl() do?
crawl() is a function in the langchain codebase, defined in libs/langchain/langchain_classic/chains/natbot/crawler.py.
Where is crawl() defined?
crawl() is defined in libs/langchain/langchain_classic/chains/natbot/crawler.py at line 150.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free