Home / Function/ test_experimental_markdown_syntax_text_splitter_with_headers() — langchain Function Reference

test_experimental_markdown_syntax_text_splitter_with_headers() — langchain Function Reference

Architecture documentation for the test_experimental_markdown_syntax_text_splitter_with_headers() function in test_text_splitters.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  eb71c901_90ad_c7ac_9434_0c6a65a6c2e8["test_experimental_markdown_syntax_text_splitter_with_headers()"]
  6d6b8ad4_1cfe_fbb0_e58e_76a50487c135["test_text_splitters.py"]
  eb71c901_90ad_c7ac_9434_0c6a65a6c2e8 -->|defined in| 6d6b8ad4_1cfe_fbb0_e58e_76a50487c135
  style eb71c901_90ad_c7ac_9434_0c6a65a6c2e8 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/tests/unit_tests/test_text_splitters.py lines 1796–1851

def test_experimental_markdown_syntax_text_splitter_with_headers() -> None:
    """Test experimental markdown syntax splitter."""
    markdown_splitter = ExperimentalMarkdownSyntaxTextSplitter(strip_headers=False)
    output = markdown_splitter.split_text(EXPERIMENTAL_MARKDOWN_DOCUMENT)

    expected_output = [
        Document(
            page_content="# My Header 1\nContent for header 1\n",
            metadata={"Header 1": "My Header 1"},
        ),
        Document(
            page_content="## Header 2\nContent for header 2\n",
            metadata={"Header 1": "My Header 1", "Header 2": "Header 2"},
        ),
        Document(
            page_content="### Header 3\nContent for header 3\n",
            metadata={
                "Header 1": "My Header 1",
                "Header 2": "Header 2",
                "Header 3": "Header 3",
            },
        ),
        Document(
            page_content=(
                "## Header 2 Again\n"
                "This should be tagged with Header 1 and Header 2 Again\n"
            ),
            metadata={"Header 1": "My Header 1", "Header 2": "Header 2 Again"},
        ),
        Document(
            page_content=(
                "```python\ndef func_definition():\n   "
                "print('Keep the whitespace consistent')\n```\n"
            ),
            metadata={
                "Code": "python",
                "Header 1": "My Header 1",
                "Header 2": "Header 2 Again",
            },
        ),
        Document(
            page_content=(
                "# Header 1 again\nWe should also split on the horizontal line\n"
            ),
            metadata={"Header 1": "Header 1 again"},
        ),
        Document(
            page_content=(
                "This will be a new doc but with the same header metadata\n\n"
                "And it includes a new paragraph"
            ),
            metadata={"Header 1": "Header 1 again"},
        ),
    ]

    assert output == expected_output

Domain

Subdomains

Frequently Asked Questions

What does test_experimental_markdown_syntax_text_splitter_with_headers() do?
test_experimental_markdown_syntax_text_splitter_with_headers() is a function in the langchain codebase, defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py.
Where is test_experimental_markdown_syntax_text_splitter_with_headers() defined?
test_experimental_markdown_syntax_text_splitter_with_headers() is defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py at line 1796.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free