Home / Function/ __init__() — langchain Function Reference

__init__() — langchain Function Reference

Architecture documentation for the __init__() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  95c4d16f_9ef1_6b54_c03d_da25e987fd1c["__init__()"]
  86dc20d4_404a_b608_01da_8dea923ef2c9["HTMLHeaderTextSplitter"]
  95c4d16f_9ef1_6b54_c03d_da25e987fd1c -->|defined in| 86dc20d4_404a_b608_01da_8dea923ef2c9
  style 95c4d16f_9ef1_6b54_c03d_da25e987fd1c fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 145–173

    def __init__(
        self,
        headers_to_split_on: list[tuple[str, str]],
        return_each_element: bool = False,  # noqa: FBT001,FBT002
    ) -> None:
        """Initialize with headers to split on.

        Args:
            headers_to_split_on: A list of `(header_tag,
                header_name)` pairs representing the headers that define splitting
                boundaries.

                For example, `[("h1", "Header 1"), ("h2", "Header 2")]` will split
                content by `h1` and `h2` tags, assigning their textual content to the
                `Document` metadata.
            return_each_element: If `True`, every HTML element encountered
                (including headers, paragraphs, etc.) is returned as a separate
                `Document`.

                If `False`, content under the same header hierarchy is aggregated into
                fewer `Document` objects.
        """
        # Sort headers by their numeric level so that h1 < h2 < h3...
        self.headers_to_split_on = sorted(
            headers_to_split_on, key=lambda x: int(x[0][1:])
        )
        self.header_mapping = dict(self.headers_to_split_on)
        self.header_tags = [tag for tag, _ in self.headers_to_split_on]
        self.return_each_element = return_each_element

Subdomains

Frequently Asked Questions

What does __init__() do?
__init__() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is __init__() defined?
__init__() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 145.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free