Home / Function/ test_html_no_headers_with_multiple_splitters() — langchain Function Reference

test_html_no_headers_with_multiple_splitters() — langchain Function Reference

Architecture documentation for the test_html_no_headers_with_multiple_splitters() function in test_text_splitters.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  92c72a6d_b050_17b1_6494_b52311e03131["test_html_no_headers_with_multiple_splitters()"]
  6d6b8ad4_1cfe_fbb0_e58e_76a50487c135["test_text_splitters.py"]
  92c72a6d_b050_17b1_6494_b52311e03131 -->|defined in| 6d6b8ad4_1cfe_fbb0_e58e_76a50487c135
  b5d48f00_ae6e_36fa_d4c3_b94b9e390692["html_header_splitter_splitter_factory()"]
  92c72a6d_b050_17b1_6494_b52311e03131 -->|calls| b5d48f00_ae6e_36fa_d4c3_b94b9e390692
  style 92c72a6d_b050_17b1_6494_b52311e03131 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/tests/unit_tests/test_text_splitters.py lines 2879–2920

def test_html_no_headers_with_multiple_splitters(
    html_header_splitter_splitter_factory: Callable[
        [list[tuple[str, str]]], HTMLHeaderTextSplitter
    ],
    headers_to_split_on: list[tuple[str, str]],
    html_content: str,
    expected_output: list[Document],
    test_case: str,
) -> None:
    """Test HTML content splitting without headers using multiple splitters.

    Args:
        html_header_splitter_splitter_factory: Factory to create the HTML header
            splitter.
        headers_to_split_on: List of headers to split on.
        html_content: HTML content to be split.
        expected_output: Expected list of `Document` objects after splitting.
        test_case: Description of the test case.

    Raises:
        AssertionError: If the number of documents or their content/metadata
            does not match the expected output.
    """
    splitter = html_header_splitter_splitter_factory(headers_to_split_on)
    docs = splitter.split_text(html_content)

    assert len(docs) == len(expected_output), (
        f"{test_case} Failed: Number of documents mismatch. "
        f"Expected {len(expected_output)}, got {len(docs)}."
    )
    for idx, (doc, expected) in enumerate(
        zip(docs, expected_output, strict=False), start=1
    ):
        assert doc.page_content == expected.page_content, (
            f"{test_case} Failed at Document {idx}: "
            f"Content mismatch.\nExpected: {expected.page_content}\n"
            "Got: {doc.page_content}"
        )
        assert doc.metadata == expected.metadata, (
            f"{test_case} Failed at Document {idx}: "
            f"Metadata mismatch.\nExpected: {expected.metadata}\nGot: {doc.metadata}"
        )

Domain

Subdomains

Frequently Asked Questions

What does test_html_no_headers_with_multiple_splitters() do?
test_html_no_headers_with_multiple_splitters() is a function in the langchain codebase, defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py.
Where is test_html_no_headers_with_multiple_splitters() defined?
test_html_no_headers_with_multiple_splitters() is defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py at line 2879.
What does test_html_no_headers_with_multiple_splitters() call?
test_html_no_headers_with_multiple_splitters() calls 1 function(s): html_header_splitter_splitter_factory.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free