Home / Function/ _process_media() — langchain Function Reference

_process_media() — langchain Function Reference

Architecture documentation for the _process_media() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  2030eaef_a33b_19d9_d540_9d9919faafba["_process_media()"]
  5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"]
  2030eaef_a33b_19d9_d540_9d9919faafba -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0
  127c75d0_d814_d16e_a93c_928f021add9c["split_text()"]
  127c75d0_d814_d16e_a93c_928f021add9c -->|calls| 2030eaef_a33b_19d9_d540_9d9919faafba
  4134a695_a3ab_4bed_f7a0_3a766652fc3e["_find_all_tags()"]
  2030eaef_a33b_19d9_d540_9d9919faafba -->|calls| 4134a695_a3ab_4bed_f7a0_3a766652fc3e
  style 2030eaef_a33b_19d9_d540_9d9919faafba fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 762–793

    def _process_media(self, soup: BeautifulSoup) -> None:
        """Processes the media elements.

        Process elements in the HTML content by wrapping them in a <media-wrapper> tag
        and converting them to Markdown format.

        Args:
            soup: Parsed HTML content using BeautifulSoup.
        """
        if self._preserve_images:
            for img_tag in _find_all_tags(soup, name="img"):
                img_src = img_tag.get("src", "")
                markdown_img = f"![image:{img_src}]({img_src})"
                wrapper = soup.new_tag("media-wrapper")
                wrapper.string = markdown_img
                img_tag.replace_with(wrapper)

        if self._preserve_videos:
            for video_tag in _find_all_tags(soup, name="video"):
                video_src = video_tag.get("src", "")
                markdown_video = f"![video:{video_src}]({video_src})"
                wrapper = soup.new_tag("media-wrapper")
                wrapper.string = markdown_video
                video_tag.replace_with(wrapper)

        if self._preserve_audio:
            for audio_tag in _find_all_tags(soup, name="audio"):
                audio_src = audio_tag.get("src", "")
                markdown_audio = f"![audio:{audio_src}]({audio_src})"
                wrapper = soup.new_tag("media-wrapper")
                wrapper.string = markdown_audio
                audio_tag.replace_with(wrapper)

Subdomains

Called By

Frequently Asked Questions

What does _process_media() do?
_process_media() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _process_media() defined?
_process_media() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 762.
What does _process_media() call?
_process_media() calls 1 function(s): _find_all_tags.
What calls _process_media()?
_process_media() is called by 1 function(s): split_text.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free