Home / Function/ split_text() — langchain Function Reference

split_text() — langchain Function Reference

Architecture documentation for the split_text() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  127c75d0_d814_d16e_a93c_928f021add9c["split_text()"]
  5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0
  e9c69e37_40ed_2949_d6dc_f6a7770ff7b8["transform_documents()"]
  e9c69e37_40ed_2949_d6dc_f6a7770ff7b8 -->|calls| 127c75d0_d814_d16e_a93c_928f021add9c
  2030eaef_a33b_19d9_d540_9d9919faafba["_process_media()"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 2030eaef_a33b_19d9_d540_9d9919faafba
  ff63d8f1_7353_0b16_2f96_7dadb57a8348["_process_links()"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|calls| ff63d8f1_7353_0b16_2f96_7dadb57a8348
  33b3aaec_4039_d612_320f_cc74c2e5758c["_filter_tags()"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 33b3aaec_4039_d612_320f_cc74c2e5758c
  252723d0_ba69_6fd1_f520_2ee9bc89cc3e["_process_html()"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 252723d0_ba69_6fd1_f520_2ee9bc89cc3e
  3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d["split_text()"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d
  style 127c75d0_d814_d16e_a93c_928f021add9c fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 715–734

    def split_text(self, text: str) -> list[Document]:
        """Splits the provided HTML text into smaller chunks based on the configuration.

        Args:
            text: The HTML content to be split.

        Returns:
            A list of `Document` objects containing the split content.
        """
        soup = BeautifulSoup(text, "html.parser")

        self._process_media(soup)

        if self._preserve_links:
            self._process_links(soup)

        if self._allowlist_tags or self._denylist_tags:
            self._filter_tags(soup)

        return self._process_html(soup)

Subdomains

Frequently Asked Questions

What does split_text() do?
split_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is split_text() defined?
split_text() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 715.
What does split_text() call?
split_text() calls 5 function(s): _filter_tags, _process_html, _process_links, _process_media, split_text.
What calls split_text()?
split_text() is called by 1 function(s): transform_documents.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free