__init__() — langchain Function Reference
Architecture documentation for the __init__() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 95c4d16f_9ef1_6b54_c03d_da25e987fd1c["__init__()"] 86dc20d4_404a_b608_01da_8dea923ef2c9["HTMLHeaderTextSplitter"] 95c4d16f_9ef1_6b54_c03d_da25e987fd1c -->|defined in| 86dc20d4_404a_b608_01da_8dea923ef2c9 style 95c4d16f_9ef1_6b54_c03d_da25e987fd1c fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 145–173
def __init__(
self,
headers_to_split_on: list[tuple[str, str]],
return_each_element: bool = False, # noqa: FBT001,FBT002
) -> None:
"""Initialize with headers to split on.
Args:
headers_to_split_on: A list of `(header_tag,
header_name)` pairs representing the headers that define splitting
boundaries.
For example, `[("h1", "Header 1"), ("h2", "Header 2")]` will split
content by `h1` and `h2` tags, assigning their textual content to the
`Document` metadata.
return_each_element: If `True`, every HTML element encountered
(including headers, paragraphs, etc.) is returned as a separate
`Document`.
If `False`, content under the same header hierarchy is aggregated into
fewer `Document` objects.
"""
# Sort headers by their numeric level so that h1 < h2 < h3...
self.headers_to_split_on = sorted(
headers_to_split_on, key=lambda x: int(x[0][1:])
)
self.header_mapping = dict(self.headers_to_split_on)
self.header_tags = [tag for tag, _ in self.headers_to_split_on]
self.return_each_element = return_each_element
Domain
Subdomains
Source
Frequently Asked Questions
What does __init__() do?
__init__() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is __init__() defined?
__init__() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 145.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free