Home / Function/ _normalize_and_clean_text() — langchain Function Reference

_normalize_and_clean_text() — langchain Function Reference

Architecture documentation for the _normalize_and_clean_text() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  06dfaecf_69d8_2d95_85d1_3aef3a299e2e["_normalize_and_clean_text()"]
  c05f2267_6bc4_946d_a7f8_3d7745082745["HTMLSemanticPreservingSplitter"]
  06dfaecf_69d8_2d95_85d1_3aef3a299e2e -->|defined in| c05f2267_6bc4_946d_a7f8_3d7745082745
  bccaa6fd_208f_1a39_f885_e3cca863a319["_process_html()"]
  bccaa6fd_208f_1a39_f885_e3cca863a319 -->|calls| 06dfaecf_69d8_2d95_85d1_3aef3a299e2e
  style 06dfaecf_69d8_2d95_85d1_3aef3a299e2e fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 825–844

    def _normalize_and_clean_text(self, text: str) -> str:
        """Normalizes the text by removing extra spaces and newlines.

        Args:
            text: The text to be normalized.

        Returns:
            The normalized text.
        """
        if self._normalize_text:
            text = text.lower()
            text = re.sub(r"[^\w\s]", "", text)
            text = re.sub(r"\s+", " ", text).strip()

        if self._stopword_removal:
            text = " ".join(
                [word for word in text.split() if word not in self._stopwords]
            )

        return text

Subdomains

Called By

Frequently Asked Questions

What does _normalize_and_clean_text() do?
_normalize_and_clean_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _normalize_and_clean_text() defined?
_normalize_and_clean_text() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 825.
What calls _normalize_and_clean_text()?
_normalize_and_clean_text() is called by 1 function(s): _process_html.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free