split_text_from_file() — langchain Function Reference
Architecture documentation for the split_text_from_file() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD cf1e77cb_9fca_ca93_1428_c967d5cb0c97["split_text_from_file()"] 86dc20d4_404a_b608_01da_8dea923ef2c9["HTMLHeaderTextSplitter"] cf1e77cb_9fca_ca93_1428_c967d5cb0c97 -->|defined in| 86dc20d4_404a_b608_01da_8dea923ef2c9 3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d["split_text()"] 3a8f906a_02bf_a0ff_6dbb_2ffbc48f937d -->|calls| cf1e77cb_9fca_ca93_1428_c967d5cb0c97 170f66e6_a026_8fd5_9128_33eeefb7dd62["split_text_from_file()"] 170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|calls| cf1e77cb_9fca_ca93_1428_c967d5cb0c97 cdce0dab_74f2_fff9_b284_195643913ed5["split_text()"] cdce0dab_74f2_fff9_b284_195643913ed5 -->|calls| cf1e77cb_9fca_ca93_1428_c967d5cb0c97 f984e4e0_af18_0fc8_0a3e_18771966d43d["_generate_documents()"] cf1e77cb_9fca_ca93_1428_c967d5cb0c97 -->|calls| f984e4e0_af18_0fc8_0a3e_18771966d43d 170f66e6_a026_8fd5_9128_33eeefb7dd62["split_text_from_file()"] cf1e77cb_9fca_ca93_1428_c967d5cb0c97 -->|calls| 170f66e6_a026_8fd5_9128_33eeefb7dd62 style cf1e77cb_9fca_ca93_1428_c967d5cb0c97 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 212–228
def split_text_from_file(self, file: str | IO[str]) -> list[Document]:
"""Split HTML content from a file into a list of `Document` objects.
Args:
file: A file path or a file-like object containing HTML content.
Returns:
A list of split `Document` objects.
Each `Document` contains `page_content` holding the extracted text and
`metadata` that maps the header hierarchy to their corresponding titles.
"""
if isinstance(file, str):
html_content = pathlib.Path(file).read_text(encoding="utf-8")
else:
html_content = file.read()
return list(self._generate_documents(html_content))
Domain
Subdomains
Source
Frequently Asked Questions
What does split_text_from_file() do?
split_text_from_file() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is split_text_from_file() defined?
split_text_from_file() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 212.
What does split_text_from_file() call?
split_text_from_file() calls 2 function(s): _generate_documents, split_text_from_file.
What calls split_text_from_file()?
split_text_from_file() is called by 3 function(s): split_text, split_text, split_text_from_file.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free