Home / Function/ test_html_splitter_with_media_preservation() — langchain Function Reference

test_html_splitter_with_media_preservation() — langchain Function Reference

Architecture documentation for the test_html_splitter_with_media_preservation() function in test_text_splitters.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  aac80f37_e6c0_db63_3faa_a22f6e00b4ea["test_html_splitter_with_media_preservation()"]
  6d6b8ad4_1cfe_fbb0_e58e_76a50487c135["test_text_splitters.py"]
  aac80f37_e6c0_db63_3faa_a22f6e00b4ea -->|defined in| 6d6b8ad4_1cfe_fbb0_e58e_76a50487c135
  style aac80f37_e6c0_db63_3faa_a22f6e00b4ea fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/tests/unit_tests/test_text_splitters.py lines 3788–3825

def test_html_splitter_with_media_preservation() -> None:
    """Test HTML splitter with media preservation.

    Test HTML splitting with media elements preserved and converted to Markdown-like
    links.
    """
    html_content = """
    <h1>Section 1</h1>
    <p>This is an image:</p>
    <img src="http://example.com/image.png" />
    <p>This is a video:</p>
    <video src="http://example.com/video.mp4"></video>
    <p>This is audio:</p>
    <audio src="http://example.com/audio.mp3"></audio>
    """
    with suppress_langchain_beta_warning():
        splitter = HTMLSemanticPreservingSplitter(
            headers_to_split_on=[("h1", "Header 1")],
            preserve_images=True,
            preserve_videos=True,
            preserve_audio=True,
            max_chunk_size=1000,
        )
    documents = splitter.split_text(html_content)

    expected = [
        Document(
            page_content="This is an image: ![image:http://example.com/image.png]"
            "(http://example.com/image.png) "
            "This is a video: ![video:http://example.com/video.mp4]"
            "(http://example.com/video.mp4) "
            "This is audio: ![audio:http://example.com/audio.mp3]"
            "(http://example.com/audio.mp3)",
            metadata={"Header 1": "Section 1"},
        ),
    ]

    assert documents == expected

Domain

Subdomains

Frequently Asked Questions

What does test_html_splitter_with_media_preservation() do?
test_html_splitter_with_media_preservation() is a function in the langchain codebase, defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py.
Where is test_html_splitter_with_media_preservation() defined?
test_html_splitter_with_media_preservation() is defined in libs/text-splitters/tests/unit_tests/test_text_splitters.py at line 3788.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free