test_html.py — langchain Source File
Architecture documentation for test_html.py, a python file in the langchain codebase. 1 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR 89d29efc_3f6a_dceb_4c69_b201fa975bdd["test_html.py"] c46dd2d3_47bc_3cd6_7824_480bdd90d9b3["langchain_core.utils.html"] 89d29efc_3f6a_dceb_4c69_b201fa975bdd --> c46dd2d3_47bc_3cd6_7824_480bdd90d9b3 style 89d29efc_3f6a_dceb_4c69_b201fa975bdd fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
from langchain_core.utils.html import (
PREFIXES_TO_IGNORE,
SUFFIXES_TO_IGNORE,
extract_sub_links,
find_all_links,
)
def test_find_all_links_none() -> None:
html = "<span>Hello world</span>"
actual = find_all_links(html)
assert actual == []
def test_find_all_links_single() -> None:
htmls = [
"href='foobar.com'",
'href="foobar.com"',
'<div><a class="blah" href="foobar.com">hullo</a></div>',
]
actual = [find_all_links(html) for html in htmls]
assert actual == [["foobar.com"]] * 3
def test_find_all_links_multiple() -> None:
html = (
'<div><a class="blah" href="https://foobar.com">hullo</a></div>'
'<div><a class="bleh" href="/baz/cool">buhbye</a></div>'
)
actual = find_all_links(html)
assert sorted(actual) == [
"/baz/cool",
"https://foobar.com",
]
def test_find_all_links_ignore_suffix() -> None:
html = 'href="foobar{suffix}"'
for suffix_ in SUFFIXES_TO_IGNORE:
actual = find_all_links(html.format(suffix=suffix_))
assert actual == []
# Don't ignore if pattern doesn't occur at end of link.
html = 'href="foobar{suffix}more"'
for suffix_ in SUFFIXES_TO_IGNORE:
actual = find_all_links(html.format(suffix=suffix_))
assert actual == [f"foobar{suffix_}more"]
def test_find_all_links_ignore_prefix() -> None:
html = 'href="{prefix}foobar"'
for prefix_ in PREFIXES_TO_IGNORE:
actual = find_all_links(html.format(prefix=prefix_))
assert actual == []
# Don't ignore if pattern doesn't occur at beginning of link.
html = 'href="foobar{prefix}more"'
for prefix_ in PREFIXES_TO_IGNORE:
# Pound signs are split on when not prefixes.
if prefix_ == "#":
// ... (150 more lines)
Domain
Subdomains
Functions
- test_extract_sub_links()
- test_extract_sub_links_base()
- test_extract_sub_links_exclude()
- test_extract_sub_links_with_query()
- test_find_all_links_drop_fragment()
- test_find_all_links_ignore_prefix()
- test_find_all_links_ignore_suffix()
- test_find_all_links_multiple()
- test_find_all_links_none()
- test_find_all_links_single()
- test_prevent_outside()
Dependencies
- langchain_core.utils.html
Source
Frequently Asked Questions
What does test_html.py do?
test_html.py is a source file in the langchain codebase, written in python. It belongs to the LangChainCore domain, ApiManagement subdomain.
What functions are defined in test_html.py?
test_html.py defines 11 function(s): test_extract_sub_links, test_extract_sub_links_base, test_extract_sub_links_exclude, test_extract_sub_links_with_query, test_find_all_links_drop_fragment, test_find_all_links_ignore_prefix, test_find_all_links_ignore_suffix, test_find_all_links_multiple, test_find_all_links_none, test_find_all_links_single, and 1 more.
What does test_html.py depend on?
test_html.py imports 1 module(s): langchain_core.utils.html.
Where is test_html.py in the architecture?
test_html.py is located at libs/core/tests/unit_tests/utils/test_html.py (domain: LangChainCore, subdomain: ApiManagement, directory: libs/core/tests/unit_tests/utils).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free