Home / File/ test_html.py — langchain Source File

test_html.py — langchain Source File

Architecture documentation for test_html.py, a python file in the langchain codebase. 1 imports, 0 dependents.

File python LangChainCore ApiManagement 1 imports 11 functions

Entity Profile

Dependency Diagram

graph LR
  89d29efc_3f6a_dceb_4c69_b201fa975bdd["test_html.py"]
  c46dd2d3_47bc_3cd6_7824_480bdd90d9b3["langchain_core.utils.html"]
  89d29efc_3f6a_dceb_4c69_b201fa975bdd --> c46dd2d3_47bc_3cd6_7824_480bdd90d9b3
  style 89d29efc_3f6a_dceb_4c69_b201fa975bdd fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

from langchain_core.utils.html import (
    PREFIXES_TO_IGNORE,
    SUFFIXES_TO_IGNORE,
    extract_sub_links,
    find_all_links,
)


def test_find_all_links_none() -> None:
    html = "<span>Hello world</span>"
    actual = find_all_links(html)
    assert actual == []


def test_find_all_links_single() -> None:
    htmls = [
        "href='foobar.com'",
        'href="foobar.com"',
        '<div><a class="blah" href="foobar.com">hullo</a></div>',
    ]
    actual = [find_all_links(html) for html in htmls]
    assert actual == [["foobar.com"]] * 3


def test_find_all_links_multiple() -> None:
    html = (
        '<div><a class="blah" href="https://foobar.com">hullo</a></div>'
        '<div><a class="bleh" href="/baz/cool">buhbye</a></div>'
    )
    actual = find_all_links(html)
    assert sorted(actual) == [
        "/baz/cool",
        "https://foobar.com",
    ]


def test_find_all_links_ignore_suffix() -> None:
    html = 'href="foobar{suffix}"'
    for suffix_ in SUFFIXES_TO_IGNORE:
        actual = find_all_links(html.format(suffix=suffix_))
        assert actual == []

    # Don't ignore if pattern doesn't occur at end of link.
    html = 'href="foobar{suffix}more"'
    for suffix_ in SUFFIXES_TO_IGNORE:
        actual = find_all_links(html.format(suffix=suffix_))
        assert actual == [f"foobar{suffix_}more"]


def test_find_all_links_ignore_prefix() -> None:
    html = 'href="{prefix}foobar"'
    for prefix_ in PREFIXES_TO_IGNORE:
        actual = find_all_links(html.format(prefix=prefix_))
        assert actual == []

    # Don't ignore if pattern doesn't occur at beginning of link.
    html = 'href="foobar{prefix}more"'
    for prefix_ in PREFIXES_TO_IGNORE:
        # Pound signs are split on when not prefixes.
        if prefix_ == "#":
// ... (150 more lines)

Domain

Subdomains

Dependencies

  • langchain_core.utils.html

Frequently Asked Questions

What does test_html.py do?
test_html.py is a source file in the langchain codebase, written in python. It belongs to the LangChainCore domain, ApiManagement subdomain.
What functions are defined in test_html.py?
test_html.py defines 11 function(s): test_extract_sub_links, test_extract_sub_links_base, test_extract_sub_links_exclude, test_extract_sub_links_with_query, test_find_all_links_drop_fragment, test_find_all_links_ignore_prefix, test_find_all_links_ignore_suffix, test_find_all_links_multiple, test_find_all_links_none, test_find_all_links_single, and 1 more.
What does test_html.py depend on?
test_html.py imports 1 module(s): langchain_core.utils.html.
Where is test_html.py in the architecture?
test_html.py is located at libs/core/tests/unit_tests/utils/test_html.py (domain: LangChainCore, subdomain: ApiManagement, directory: libs/core/tests/unit_tests/utils).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free