convert_possible_tags_to_header() — langchain Function Reference
Architecture documentation for the convert_possible_tags_to_header() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD c2708424_8958_9cb1_390e_d816b56479f3["convert_possible_tags_to_header()"] 0c8a5f97_7cb0_fe24_746d_9689c4e5426c["HTMLSectionSplitter"] c2708424_8958_9cb1_390e_d816b56479f3 -->|defined in| 0c8a5f97_7cb0_fe24_746d_9689c4e5426c 170f66e6_a026_8fd5_9128_33eeefb7dd62["split_text_from_file()"] 170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|calls| c2708424_8958_9cb1_390e_d816b56479f3 style c2708424_8958_9cb1_390e_d816b56479f3 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 493–528
def convert_possible_tags_to_header(self, html_content: str) -> str:
"""Convert specific HTML tags to headers using an XSLT transformation.
This method uses an XSLT file to transform the HTML content, converting
certain tags into headers for easier parsing. If no XSLT path is provided,
the HTML content is returned unchanged.
Args:
html_content: The HTML content to be transformed.
Returns:
The transformed HTML content as a string.
Raises:
ImportError: If the `lxml` library is not installed.
"""
if not _HAS_LXML:
msg = "Unable to import lxml, please install with `pip install lxml`."
raise ImportError(msg)
# use lxml library to parse html document and return xml ElementTree
# Create secure parsers to prevent XXE attacks
html_parser = etree.HTMLParser(no_network=True)
xslt_parser = etree.XMLParser(
resolve_entities=False, no_network=True, load_dtd=False
)
# Apply XSLT access control to prevent file/network access
# DENY_ALL is a predefined access control that blocks all file/network access
# Type ignore needed due to incomplete lxml type stubs
ac = etree.XSLTAccessControl.DENY_ALL # type: ignore[attr-defined]
tree = etree.parse(StringIO(html_content), html_parser)
xslt_tree = etree.parse(self.xslt_path, xslt_parser)
transform = etree.XSLT(xslt_tree, access_control=ac)
result = transform(tree)
return str(result)
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does convert_possible_tags_to_header() do?
convert_possible_tags_to_header() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is convert_possible_tags_to_header() defined?
convert_possible_tags_to_header() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 493.
What calls convert_possible_tags_to_header()?
convert_possible_tags_to_header() is called by 1 function(s): split_text_from_file.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free