Home / Function/ _further_split_chunk() — langchain Function Reference

_further_split_chunk() — langchain Function Reference

Architecture documentation for the _further_split_chunk() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  1ad208c7_864b_e6dd_1344_b5ed70211298["_further_split_chunk()"]
  5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"]
  1ad208c7_864b_e6dd_1344_b5ed70211298 -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0
  5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8["_create_documents()"]
  5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|calls| 1ad208c7_864b_e6dd_1344_b5ed70211298
  f7ca6eae_27af_591b_5082_e978259ac965["_reinsert_preserved_elements()"]
  1ad208c7_864b_e6dd_1344_b5ed70211298 -->|calls| f7ca6eae_27af_591b_5082_e978259ac965
  18590a0b_5de9_0196_2d21_608d79b9ef70["split_text()"]
  1ad208c7_864b_e6dd_1344_b5ed70211298 -->|calls| 18590a0b_5de9_0196_2d21_608d79b9ef70
  style 1ad208c7_864b_e6dd_1344_b5ed70211298 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 1015–1043

    def _further_split_chunk(
        self, content: str, metadata: dict[Any, Any], preserved_elements: dict[str, str]
    ) -> list[Document]:
        """Further splits the content into smaller chunks.

        Args:
            content: The content to be split.
            metadata: Metadata to attach to each chunk.
            preserved_elements: Preserved elements to be reinserted into each chunk.

        Returns:
            A list of `Document` objects containing the split content.
        """
        splits = self._recursive_splitter.split_text(content)
        result = []

        for split in splits:
            split_with_preserved = self._reinsert_preserved_elements(
                split, preserved_elements
            )
            if split_with_preserved.strip():
                result.append(
                    Document(
                        page_content=split_with_preserved.strip(),
                        metadata=metadata,
                    )
                )

        return result

Subdomains

Frequently Asked Questions

What does _further_split_chunk() do?
_further_split_chunk() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _further_split_chunk() defined?
_further_split_chunk() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 1015.
What does _further_split_chunk() call?
_further_split_chunk() calls 2 function(s): _reinsert_preserved_elements, split_text.
What calls _further_split_chunk()?
_further_split_chunk() is called by 1 function(s): _create_documents.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free