_process_media() — langchain Function Reference
Architecture documentation for the _process_media() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 2030eaef_a33b_19d9_d540_9d9919faafba["_process_media()"] 5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"] 2030eaef_a33b_19d9_d540_9d9919faafba -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0 127c75d0_d814_d16e_a93c_928f021add9c["split_text()"] 127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 2030eaef_a33b_19d9_d540_9d9919faafba 4134a695_a3ab_4bed_f7a0_3a766652fc3e["_find_all_tags()"] 2030eaef_a33b_19d9_d540_9d9919faafba -->|calls| 4134a695_a3ab_4bed_f7a0_3a766652fc3e style 2030eaef_a33b_19d9_d540_9d9919faafba fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 762–793
def _process_media(self, soup: BeautifulSoup) -> None:
"""Processes the media elements.
Process elements in the HTML content by wrapping them in a <media-wrapper> tag
and converting them to Markdown format.
Args:
soup: Parsed HTML content using BeautifulSoup.
"""
if self._preserve_images:
for img_tag in _find_all_tags(soup, name="img"):
img_src = img_tag.get("src", "")
markdown_img = f""
wrapper = soup.new_tag("media-wrapper")
wrapper.string = markdown_img
img_tag.replace_with(wrapper)
if self._preserve_videos:
for video_tag in _find_all_tags(soup, name="video"):
video_src = video_tag.get("src", "")
markdown_video = f""
wrapper = soup.new_tag("media-wrapper")
wrapper.string = markdown_video
video_tag.replace_with(wrapper)
if self._preserve_audio:
for audio_tag in _find_all_tags(soup, name="audio"):
audio_src = audio_tag.get("src", "")
markdown_audio = f""
wrapper = soup.new_tag("media-wrapper")
wrapper.string = markdown_audio
audio_tag.replace_with(wrapper)
Domain
Subdomains
Calls
Called By
Source
Frequently Asked Questions
What does _process_media() do?
_process_media() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _process_media() defined?
_process_media() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 762.
What does _process_media() call?
_process_media() calls 1 function(s): _find_all_tags.
What calls _process_media()?
_process_media() is called by 1 function(s): split_text.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free