Home / Function/ split_text_from_file() — langchain Function Reference

split_text_from_file() — langchain Function Reference

Architecture documentation for the split_text_from_file() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  170f66e6_a026_8fd5_9128_33eeefb7dd62["split_text_from_file()"]
  0c8a5f97_7cb0_fe24_746d_9689c4e5426c["HTMLSectionSplitter"]
  170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|defined in| 0c8a5f97_7cb0_fe24_746d_9689c4e5426c
  cf1e77cb_9fca_ca93_1428_c967d5cb0c97["split_text_from_file()"]
  cf1e77cb_9fca_ca93_1428_c967d5cb0c97 -->|calls| 170f66e6_a026_8fd5_9128_33eeefb7dd62
  cdce0dab_74f2_fff9_b284_195643913ed5["split_text()"]
  cdce0dab_74f2_fff9_b284_195643913ed5 -->|calls| 170f66e6_a026_8fd5_9128_33eeefb7dd62
  c2708424_8958_9cb1_390e_d816b56479f3["convert_possible_tags_to_header()"]
  170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|calls| c2708424_8958_9cb1_390e_d816b56479f3
  219ab3b6_0b12_7f58_ba5f_9bfbebda0057["split_html_by_headers()"]
  170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|calls| 219ab3b6_0b12_7f58_ba5f_9bfbebda0057
  cf1e77cb_9fca_ca93_1428_c967d5cb0c97["split_text_from_file()"]
  170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|calls| cf1e77cb_9fca_ca93_1428_c967d5cb0c97
  style 170f66e6_a026_8fd5_9128_33eeefb7dd62 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 530–553

    def split_text_from_file(self, file: StringIO) -> list[Document]:
        """Split HTML content from a file into a list of `Document` objects.

        Args:
            file: A file path or a file-like object containing HTML content.

        Returns:
            A list of split `Document` objects.
        """
        file_content = file.getvalue()
        file_content = self.convert_possible_tags_to_header(file_content)
        sections = self.split_html_by_headers(file_content)

        return [
            Document(
                cast("str", section["content"]),
                metadata={
                    self.headers_to_split_on[str(section["tag_name"])]: section[
                        "header"
                    ]
                },
            )
            for section in sections
        ]

Subdomains

Frequently Asked Questions

What does split_text_from_file() do?
split_text_from_file() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is split_text_from_file() defined?
split_text_from_file() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 530.
What does split_text_from_file() call?
split_text_from_file() calls 3 function(s): convert_possible_tags_to_header, split_html_by_headers, split_text_from_file.
What calls split_text_from_file()?
split_text_from_file() is called by 2 function(s): split_text, split_text_from_file.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free