split_text_from_url() — langchain Function Reference

Architecture documentation for the split_text_from_url() function in html.py from the langchain codebase.

Function python DocumentProcessing TextSplitters calls 1

Entity Profile

DocumentProcessing→ TextSplitters→ split_text_from_url() — langchain Function Reference

Dependency Diagram

graph TD
  982f8e7f_63e2_a8f4_7f7f_3def7fb3d84b["split_text_from_url()"]
  86dc20d4_404a_b608_01da_8dea923ef2c9["HTMLHeaderTextSplitter"]
  982f8e7f_63e2_a8f4_7f7f_3def7fb3d84b -->|defined in| 86dc20d4_404a_b608_01da_8dea923ef2c9
  3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d["split_text()"]
  982f8e7f_63e2_a8f4_7f7f_3def7fb3d84b -->|calls| 3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d
  style 982f8e7f_63e2_a8f4_7f7f_3def7fb3d84b fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 189–210

    def split_text_from_url(
        self, url: str, timeout: int = 10, **kwargs: Any
    ) -> list[Document]:
        """Fetch text content from a URL and split it into documents.

        Args:
            url: The URL to fetch content from.
            timeout: Timeout for the request.
            **kwargs: Additional keyword arguments for the request.

        Returns:
            A list of split `Document` objects.

                Each `Document` contains `page_content` holding the extracted text and
                `metadata` that maps the header hierarchy to their corresponding titles.

        Raises:
            requests.RequestException: If the HTTP request fails.
        """
        response = requests.get(url, timeout=timeout, **kwargs)
        response.raise_for_status()
        return self.split_text(response.text)