html.py — langchain Source File
Architecture documentation for html.py, a python file in the langchain codebase. 15 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR e3efe57c_5b49_c26c_6ca5_45acccb8037f["html.py"] e874d8a4_cef0_9d0b_d1ee_84999c07cc2c["copy"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> e874d8a4_cef0_9d0b_d1ee_84999c07cc2c b6ee5de5_719a_eeb5_1e11_e9c63bc22ef8["pathlib"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> b6ee5de5_719a_eeb5_1e11_e9c63bc22ef8 67ec3255_645e_8b6e_1eff_1eb3c648ed95["re"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 67ec3255_645e_8b6e_1eff_1eb3c648ed95 4e334bc1_18d9_a6a4_18e5_7a3030396c51["io"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 4e334bc1_18d9_a6a4_18e5_7a3030396c51 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 792c09b7_7372_31d2_e29c_dc98949aa3c2["requests"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 792c09b7_7372_31d2_e29c_dc98949aa3c2 b19a8b7e_fbee_95b1_65b8_509a1ed3cad7["langchain_core._api"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> b19a8b7e_fbee_95b1_65b8_509a1ed3cad7 c554676d_b731_47b2_a98f_c1c2d537c0aa["langchain_core.documents"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> c554676d_b731_47b2_a98f_c1c2d537c0aa 91721f45_4909_e489_8c1f_084f8bd87145["typing_extensions"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 91721f45_4909_e489_8c1f_084f8bd87145 26e26c06_c107_2778_a237_35607f5a6d20["langchain_text_splitters.character"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 26e26c06_c107_2778_a237_35607f5a6d20 cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7 12ec29de_c252_354e_e837_1cd86b8f7af4["bs4.element"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 12ec29de_c252_354e_e837_1cd86b8f7af4 0a45c4a1_846f_03df_b842_eb6b566c6404["nltk.py"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> 0a45c4a1_846f_03df_b842_eb6b566c6404 eb7a0951_cedf_4f9a_c480_750414eb0f4e["bs4"] e3efe57c_5b49_c26c_6ca5_45acccb8037f --> eb7a0951_cedf_4f9a_c480_750414eb0f4e style e3efe57c_5b49_c26c_6ca5_45acccb8037f fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""HTML text splitters."""
from __future__ import annotations
import copy
import pathlib
import re
from io import StringIO
from typing import (
IO,
TYPE_CHECKING,
Any,
Literal,
TypedDict,
cast,
)
import requests
from langchain_core._api import beta
from langchain_core.documents import BaseDocumentTransformer, Document
from typing_extensions import override
from langchain_text_splitters.character import RecursiveCharacterTextSplitter
if TYPE_CHECKING:
from collections.abc import Callable, Iterable, Iterator, Sequence
from bs4.element import ResultSet
try:
import nltk
_HAS_NLTK = True
except ImportError:
_HAS_NLTK = False
try:
from bs4 import BeautifulSoup, Tag
from bs4.element import NavigableString, PageElement
_HAS_BS4 = True
except ImportError:
_HAS_BS4 = False
try:
from lxml import etree
_HAS_LXML = True
except ImportError:
_HAS_LXML = False
class ElementType(TypedDict):
"""Element type as typed dict."""
url: str
xpath: str
content: str
metadata: dict[str, str]
// ... (1004 more lines)
Domain
Subdomains
Functions
Dependencies
- bs4
- bs4.element
- collections.abc
- copy
- io
- langchain_core._api
- langchain_core.documents
- langchain_text_splitters.character
- lxml
- nltk.py
- pathlib
- re
- requests
- typing
- typing_extensions
Source
Frequently Asked Questions
What does html.py do?
html.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, TextSplitters subdomain.
What functions are defined in html.py?
html.py defines 9 function(s): _HAS_BS4, _HAS_LXML, _HAS_NLTK, _find_all_strings, _find_all_tags, bs4, collections, lxml, nltk.
What does html.py depend on?
html.py imports 15 module(s): bs4, bs4.element, collections.abc, copy, io, langchain_core._api, langchain_core.documents, langchain_text_splitters.character, and 7 more.
Where is html.py in the architecture?
html.py is located at libs/text-splitters/langchain_text_splitters/html.py (domain: DocumentProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free