Home / Function/ convert_possible_tags_to_header() — langchain Function Reference

convert_possible_tags_to_header() — langchain Function Reference

Architecture documentation for the convert_possible_tags_to_header() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  c2708424_8958_9cb1_390e_d816b56479f3["convert_possible_tags_to_header()"]
  0c8a5f97_7cb0_fe24_746d_9689c4e5426c["HTMLSectionSplitter"]
  c2708424_8958_9cb1_390e_d816b56479f3 -->|defined in| 0c8a5f97_7cb0_fe24_746d_9689c4e5426c
  170f66e6_a026_8fd5_9128_33eeefb7dd62["split_text_from_file()"]
  170f66e6_a026_8fd5_9128_33eeefb7dd62 -->|calls| c2708424_8958_9cb1_390e_d816b56479f3
  style c2708424_8958_9cb1_390e_d816b56479f3 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 493–528

    def convert_possible_tags_to_header(self, html_content: str) -> str:
        """Convert specific HTML tags to headers using an XSLT transformation.

        This method uses an XSLT file to transform the HTML content, converting
        certain tags into headers for easier parsing. If no XSLT path is provided,
        the HTML content is returned unchanged.

        Args:
            html_content: The HTML content to be transformed.

        Returns:
            The transformed HTML content as a string.

        Raises:
            ImportError: If the `lxml` library is not installed.
        """
        if not _HAS_LXML:
            msg = "Unable to import lxml, please install with `pip install lxml`."
            raise ImportError(msg)
        # use lxml library to parse html document and return xml ElementTree
        # Create secure parsers to prevent XXE attacks
        html_parser = etree.HTMLParser(no_network=True)
        xslt_parser = etree.XMLParser(
            resolve_entities=False, no_network=True, load_dtd=False
        )

        # Apply XSLT access control to prevent file/network access
        # DENY_ALL is a predefined access control that blocks all file/network access
        # Type ignore needed due to incomplete lxml type stubs
        ac = etree.XSLTAccessControl.DENY_ALL  # type: ignore[attr-defined]

        tree = etree.parse(StringIO(html_content), html_parser)
        xslt_tree = etree.parse(self.xslt_path, xslt_parser)
        transform = etree.XSLT(xslt_tree, access_control=ac)
        result = transform(tree)
        return str(result)

Subdomains

Frequently Asked Questions

What does convert_possible_tags_to_header() do?
convert_possible_tags_to_header() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is convert_possible_tags_to_header() defined?
convert_possible_tags_to_header() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 493.
What calls convert_possible_tags_to_header()?
convert_possible_tags_to_header() is called by 1 function(s): split_text_from_file.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free