Home / Function/ _split_text() — langchain Function Reference

_split_text() — langchain Function Reference

Architecture documentation for the _split_text() function in character.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  90247d99_f9d1_5357_ea60_e7b8e740431f["_split_text()"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8["RecursiveCharacterTextSplitter"]
  90247d99_f9d1_5357_ea60_e7b8e740431f -->|defined in| 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8
  cdc32315_d799_46f6_bd91_09d4da023d15["split_text()"]
  cdc32315_d799_46f6_bd91_09d4da023d15 -->|calls| 90247d99_f9d1_5357_ea60_e7b8e740431f
  4df0af83_3c38_0acb_3015_c017f66de0cc["_split_text_with_regex()"]
  90247d99_f9d1_5357_ea60_e7b8e740431f -->|calls| 4df0af83_3c38_0acb_3015_c017f66de0cc
  style 90247d99_f9d1_5357_ea60_e7b8e740431f fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/character.py lines 107–147

    def _split_text(self, text: str, separators: list[str]) -> list[str]:
        """Split incoming text and return chunks."""
        final_chunks = []
        # Get appropriate separator to use
        separator = separators[-1]
        new_separators = []
        for i, s_ in enumerate(separators):
            separator_ = s_ if self._is_separator_regex else re.escape(s_)
            if not s_:
                separator = s_
                break
            if re.search(separator_, text):
                separator = s_
                new_separators = separators[i + 1 :]
                break

        separator_ = separator if self._is_separator_regex else re.escape(separator)
        splits = _split_text_with_regex(
            text, separator_, keep_separator=self._keep_separator
        )

        # Now go merging things, recursively splitting longer texts.
        good_splits = []
        separator_ = "" if self._keep_separator else separator
        for s in splits:
            if self._length_function(s) < self._chunk_size:
                good_splits.append(s)
            else:
                if good_splits:
                    merged_text = self._merge_splits(good_splits, separator_)
                    final_chunks.extend(merged_text)
                    good_splits = []
                if not new_separators:
                    final_chunks.append(s)
                else:
                    other_info = self._split_text(s, new_separators)
                    final_chunks.extend(other_info)
        if good_splits:
            merged_text = self._merge_splits(good_splits, separator_)
            final_chunks.extend(merged_text)
        return final_chunks

Subdomains

Called By

Frequently Asked Questions

What does _split_text() do?
_split_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/character.py.
Where is _split_text() defined?
_split_text() is defined in libs/text-splitters/langchain_text_splitters/character.py at line 107.
What does _split_text() call?
_split_text() calls 1 function(s): _split_text_with_regex.
What calls _split_text()?
_split_text() is called by 1 function(s): split_text.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free