Home / Function/ __init__() — langchain Function Reference

__init__() — langchain Function Reference

Architecture documentation for the __init__() function in markdown.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  fc2b67ed_e223_c683_5ac5_a115772a6829["__init__()"]
  cd7394a9_9856_dc15_cb00_078cf42f0529["ExperimentalMarkdownSyntaxTextSplitter"]
  fc2b67ed_e223_c683_5ac5_a115772a6829 -->|defined in| cd7394a9_9856_dc15_cb00_078cf42f0529
  style fc2b67ed_e223_c683_5ac5_a115772a6829 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/markdown.py lines 333–370

    def __init__(
        self,
        headers_to_split_on: list[tuple[str, str]] | None = None,
        return_each_line: bool = False,  # noqa: FBT001,FBT002
        strip_headers: bool = True,  # noqa: FBT001,FBT002
    ) -> None:
        """Initialize the text splitter with header splitting and formatting options.

        This constructor sets up the required configuration for splitting text into
        chunks based on specified headers and formatting preferences.

        Args:
            headers_to_split_on: A list of tuples, where each tuple contains a header
                tag (e.g., "h1") and its corresponding metadata key.

                If `None`, default headers are used.
            return_each_line: Whether to return each line as an individual chunk.

                Defaults to `False`, which aggregates lines into larger chunks.
            strip_headers: Whether to exclude headers from the resulting chunks.
        """
        self.chunks: list[Document] = []
        self.current_chunk = Document(page_content="")
        self.current_header_stack: list[tuple[int, str]] = []
        self.strip_headers = strip_headers
        if headers_to_split_on:
            self.splittable_headers = dict(headers_to_split_on)
        else:
            self.splittable_headers = {
                "#": "Header 1",
                "##": "Header 2",
                "###": "Header 3",
                "####": "Header 4",
                "#####": "Header 5",
                "######": "Header 6",
            }

        self.return_each_line = return_each_line

Subdomains

Frequently Asked Questions

What does __init__() do?
__init__() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/markdown.py.
Where is __init__() defined?
__init__() is defined in libs/text-splitters/langchain_text_splitters/markdown.py at line 333.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free