_normalize_and_clean_text() — langchain Function Reference
Architecture documentation for the _normalize_and_clean_text() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 06dfaecf_69d8_2d95_85d1_3aef3a299e2e["_normalize_and_clean_text()"] c05f2267_6bc4_946d_a7f8_3d7745082745["HTMLSemanticPreservingSplitter"] 06dfaecf_69d8_2d95_85d1_3aef3a299e2e -->|defined in| c05f2267_6bc4_946d_a7f8_3d7745082745 bccaa6fd_208f_1a39_f885_e3cca863a319["_process_html()"] bccaa6fd_208f_1a39_f885_e3cca863a319 -->|calls| 06dfaecf_69d8_2d95_85d1_3aef3a299e2e style 06dfaecf_69d8_2d95_85d1_3aef3a299e2e fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 825–844
def _normalize_and_clean_text(self, text: str) -> str:
"""Normalizes the text by removing extra spaces and newlines.
Args:
text: The text to be normalized.
Returns:
The normalized text.
"""
if self._normalize_text:
text = text.lower()
text = re.sub(r"[^\w\s]", "", text)
text = re.sub(r"\s+", " ", text).strip()
if self._stopword_removal:
text = " ".join(
[word for word in text.split() if word not in self._stopwords]
)
return text
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does _normalize_and_clean_text() do?
_normalize_and_clean_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _normalize_and_clean_text() defined?
_normalize_and_clean_text() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 825.
What calls _normalize_and_clean_text()?
_normalize_and_clean_text() is called by 1 function(s): _process_html.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free