Home / Function/ test_html_splitter_with_small_chunk_size() — langchain Function Reference

test_html_splitter_with_small_chunk_size() — langchain Function Reference

Architecture documentation for the test_html_splitter_with_small_chunk_size() function in test_text_splitters.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  1f15d741_47d5_3665_c6e2_be28c8f0dfb7["test_html_splitter_with_small_chunk_size()"]
  6d6b8ad4_1cfe_fbb0_e58e_76a50487c135["test_text_splitters.py"]
  1f15d741_47d5_3665_c6e2_be28c8f0dfb7 -->|defined in| 6d6b8ad4_1cfe_fbb0_e58e_76a50487c135
  style 1f15d741_47d5_3665_c6e2_be28c8f0dfb7 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/tests/unit_tests/test_text_splitters.py lines 3596–3620

def test_html_splitter_with_small_chunk_size() -> None:
    """Test HTML splitting with a very small chunk size to validate chunking."""
    html_content = """
    <h1>Section 1</h1>
    <p>This is some long text that should be split into multiple chunks due to the
    small chunk size.</p>
    """
    with suppress_langchain_beta_warning():
        splitter = HTMLSemanticPreservingSplitter(
            headers_to_split_on=[("h1", "Header 1")], max_chunk_size=20, chunk_overlap=5
        )
    documents = splitter.split_text(html_content)

    expected = [
        Document(page_content="This is some long", metadata={"Header 1": "Section 1"}),
        Document(page_content="long text that", metadata={"Header 1": "Section 1"}),
        Document(page_content="that should be", metadata={"Header 1": "Section 1"}),
        Document(page_content="be split into", metadata={"Header 1": "Section 1"}),
        Document(page_content="into multiple", metadata={"Header 1": "Section 1"}),
        Document(page_content="chunks due to the", metadata={"Header 1": "Section 1"}),
        Document(page_content="the small chunk", metadata={"Header 1": "Section 1"}),
        Document(page_content="size.", metadata={"Header 1": "Section 1"}),
    ]

    assert documents == expected  # Should split into multiple chunks

Domain

Subdomains

Frequently Asked Questions

What does test_html_splitter_with_small_chunk_size() do?
test_html_splitter_with_small_chunk_size() is a function in the langchain codebase, defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py.
Where is test_html_splitter_with_small_chunk_size() defined?
test_html_splitter_with_small_chunk_size() is defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py at line 3596.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free