split_text() — langchain Function Reference
Architecture documentation for the split_text() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 127c75d0_d814_d16e_a93c_928f021add9c["split_text()"] 5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"] 127c75d0_d814_d16e_a93c_928f021add9c -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0 e9c69e37_40ed_2949_d6dc_f6a7770ff7b8["transform_documents()"] e9c69e37_40ed_2949_d6dc_f6a7770ff7b8 -->|calls| 127c75d0_d814_d16e_a93c_928f021add9c 2030eaef_a33b_19d9_d540_9d9919faafba["_process_media()"] 127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 2030eaef_a33b_19d9_d540_9d9919faafba ff63d8f1_7353_0b16_2f96_7dadb57a8348["_process_links()"] 127c75d0_d814_d16e_a93c_928f021add9c -->|calls| ff63d8f1_7353_0b16_2f96_7dadb57a8348 33b3aaec_4039_d612_320f_cc74c2e5758c["_filter_tags()"] 127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 33b3aaec_4039_d612_320f_cc74c2e5758c 252723d0_ba69_6fd1_f520_2ee9bc89cc3e["_process_html()"] 127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 252723d0_ba69_6fd1_f520_2ee9bc89cc3e 3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d["split_text()"] 127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d style 127c75d0_d814_d16e_a93c_928f021add9c fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 715–734
def split_text(self, text: str) -> list[Document]:
"""Splits the provided HTML text into smaller chunks based on the configuration.
Args:
text: The HTML content to be split.
Returns:
A list of `Document` objects containing the split content.
"""
soup = BeautifulSoup(text, "html.parser")
self._process_media(soup)
if self._preserve_links:
self._process_links(soup)
if self._allowlist_tags or self._denylist_tags:
self._filter_tags(soup)
return self._process_html(soup)
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does split_text() do?
split_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is split_text() defined?
split_text() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 715.
What does split_text() call?
split_text() calls 5 function(s): _filter_tags, _process_html, _process_links, _process_media, split_text.
What calls split_text()?
split_text() is called by 1 function(s): transform_documents.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free