Home / Function/ test_section_aware_happy_path_splitting_based_on_header_1_2() — langchain Function Reference

test_section_aware_happy_path_splitting_based_on_header_1_2() — langchain Function Reference

Architecture documentation for the test_section_aware_happy_path_splitting_based_on_header_1_2() function in test_text_splitters.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  aad223fa_96fc_f906_193e_b22acdb7dc87["test_section_aware_happy_path_splitting_based_on_header_1_2()"]
  6d6b8ad4_1cfe_fbb0_e58e_76a50487c135["test_text_splitters.py"]
  aad223fa_96fc_f906_193e_b22acdb7dc87 -->|defined in| 6d6b8ad4_1cfe_fbb0_e58e_76a50487c135
  style aad223fa_96fc_f906_193e_b22acdb7dc87 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/tests/unit_tests/test_text_splitters.py lines 2955–3004

def test_section_aware_happy_path_splitting_based_on_header_1_2() -> None:
    # arrange
    html_string = """<!DOCTYPE html>
            <html>
            <body>
                <div>
                    <h1>Foo</h1>
                    <p>Some intro text about Foo.</p>
                    <div>
                        <h2>Bar main section</h2>
                        <p>Some intro text about Bar.</p>
                        <h3>Bar subsection 1</h3>
                        <p>Some text about the first subtopic of Bar.</p>
                        <h3>Bar subsection 2</h3>
                        <p>Some text about the second subtopic of Bar.</p>
                    </div>
                    <div>
                        <h2>Baz</h2>
                        <p>Some text about Baz</p>
                    </div>
                    <br>
                    <p>Some concluding text about Foo</p>
                </div>
            </body>
            </html>"""

    sec_splitter = HTMLSectionSplitter(
        headers_to_split_on=[("h1", "Header 1"), ("h2", "Header 2")]
    )

    docs = sec_splitter.split_text(html_string)

    assert len(docs) == 3
    assert docs[0].metadata["Header 1"] == "Foo"
    assert docs[0].page_content == "Foo \n Some intro text about Foo."

    assert docs[1].page_content == (
        "Bar main section \n Some intro text about Bar. \n "
        "Bar subsection 1 \n Some text about the first subtopic of Bar. \n "
        "Bar subsection 2 \n Some text about the second subtopic of Bar."
    )
    assert docs[1].metadata["Header 2"] == "Bar main section"

    assert (
        docs[2].page_content
        == "Baz \n Some text about Baz \n \n \n Some concluding text about Foo"
    )
    # Baz \n Some text about Baz \n \n \n Some concluding text about Foo
    # Baz \n Some text about Baz \n \n Some concluding text about Foo
    assert docs[2].metadata["Header 2"] == "Baz"

Domain

Subdomains

Frequently Asked Questions

What does test_section_aware_happy_path_splitting_based_on_header_1_2() do?
test_section_aware_happy_path_splitting_based_on_header_1_2() is a function in the langchain codebase, defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py.
Where is test_section_aware_happy_path_splitting_based_on_header_1_2() defined?
test_section_aware_happy_path_splitting_based_on_header_1_2() is defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py at line 2955.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free