Home / Function/ test_happy_path_splitting_with_duplicate_header_tag() — langchain Function Reference

test_happy_path_splitting_with_duplicate_header_tag() — langchain Function Reference

Architecture documentation for the test_happy_path_splitting_with_duplicate_header_tag() function in test_text_splitters.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  5a3fe592_7239_f1db_fa60_f8288a771b5f["test_happy_path_splitting_with_duplicate_header_tag()"]
  6d6b8ad4_1cfe_fbb0_e58e_76a50487c135["test_text_splitters.py"]
  5a3fe592_7239_f1db_fa60_f8288a771b5f -->|defined in| 6d6b8ad4_1cfe_fbb0_e58e_76a50487c135
  style 5a3fe592_7239_f1db_fa60_f8288a771b5f fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/tests/unit_tests/test_text_splitters.py lines 3111–3159

def test_happy_path_splitting_with_duplicate_header_tag() -> None:
    # arrange
    html_string = """<!DOCTYPE html>
        <html>
        <body>
            <div>
                <h1>Foo</h1>
                <p>Some intro text about Foo.</p>
                <div>
                    <h2>Bar main section</h2>
                    <p>Some intro text about Bar.</p>
                    <h3>Bar subsection 1</h3>
                    <p>Some text about the first subtopic of Bar.</p>
                    <h3>Bar subsection 2</h3>
                    <p>Some text about the second subtopic of Bar.</p>
                </div>
                <div>
                    <h2>Foo</h2>
                    <p>Some text about Baz</p>
                </div>
                <h1>Foo</h1>
                <br>
                <p>Some concluding text about Foo</p>
            </div>
        </body>
        </html>"""

    sec_splitter = HTMLSectionSplitter(
        headers_to_split_on=[("h1", "Header 1"), ("h2", "Header 2")]
    )

    docs = sec_splitter.split_text(html_string)

    assert len(docs) == 4
    assert docs[0].page_content == "Foo \n Some intro text about Foo."
    assert docs[0].metadata["Header 1"] == "Foo"

    assert docs[1].page_content == (
        "Bar main section \n Some intro text about Bar. \n "
        "Bar subsection 1 \n Some text about the first subtopic of Bar. \n "
        "Bar subsection 2 \n Some text about the second subtopic of Bar."
    )
    assert docs[1].metadata["Header 2"] == "Bar main section"

    assert docs[2].page_content == "Foo \n Some text about Baz"
    assert docs[2].metadata["Header 2"] == "Foo"

    assert docs[3].page_content == "Foo \n \n Some concluding text about Foo"
    assert docs[3].metadata["Header 1"] == "Foo"

Domain

Subdomains

Frequently Asked Questions

What does test_happy_path_splitting_with_duplicate_header_tag() do?
test_happy_path_splitting_with_duplicate_header_tag() is a function in the langchain codebase, defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py.
Where is test_happy_path_splitting_with_duplicate_header_tag() defined?
test_happy_path_splitting_with_duplicate_header_tag() is defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py at line 3111.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free