Home / Class/ RecursiveCharacterTextSplitter Class — langchain Architecture

RecursiveCharacterTextSplitter Class — langchain Architecture

Architecture documentation for the RecursiveCharacterTextSplitter class in character.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8["RecursiveCharacterTextSplitter"]
  c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|extends| c86e37d5_f962_cc1e_9821_b665e1359ae8
  27cf1dbb_403e_cf43_5525_c1cbd82ba6cb["Language"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|extends| 27cf1dbb_403e_cf43_5525_c1cbd82ba6cb
  2928a4a1_9408_cbea_fa7c_7f66eab697a2["character.py"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|defined in| 2928a4a1_9408_cbea_fa7c_7f66eab697a2
  73b28b45_ccac_996b_802d_26962fef2462["__init__()"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 73b28b45_ccac_996b_802d_26962fef2462
  90247d99_f9d1_5357_ea60_e7b8e740431f["_split_text()"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 90247d99_f9d1_5357_ea60_e7b8e740431f
  cdc32315_d799_46f6_bd91_09d4da023d15["split_text()"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| cdc32315_d799_46f6_bd91_09d4da023d15
  72753d58_c67a_9a23_1006_2861c5bf1d1a["from_language()"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 72753d58_c67a_9a23_1006_2861c5bf1d1a
  38b02c9d_fd32_4960_73f8_de1c2c0d0827["get_separators_for_language()"]
  22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 38b02c9d_fd32_4960_73f8_de1c2c0d0827

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/character.py lines 88–803

class RecursiveCharacterTextSplitter(TextSplitter):
    """Splitting text by recursively look at characters.

    Recursively tries to split by different characters to find one
    that works.
    """

    def __init__(
        self,
        separators: list[str] | None = None,
        keep_separator: bool | Literal["start", "end"] = True,  # noqa: FBT001,FBT002
        is_separator_regex: bool = False,  # noqa: FBT001,FBT002
        **kwargs: Any,
    ) -> None:
        """Create a new TextSplitter."""
        super().__init__(keep_separator=keep_separator, **kwargs)
        self._separators = separators or ["\n\n", "\n", " ", ""]
        self._is_separator_regex = is_separator_regex

    def _split_text(self, text: str, separators: list[str]) -> list[str]:
        """Split incoming text and return chunks."""
        final_chunks = []
        # Get appropriate separator to use
        separator = separators[-1]
        new_separators = []
        for i, s_ in enumerate(separators):
            separator_ = s_ if self._is_separator_regex else re.escape(s_)
            if not s_:
                separator = s_
                break
            if re.search(separator_, text):
                separator = s_
                new_separators = separators[i + 1 :]
                break

        separator_ = separator if self._is_separator_regex else re.escape(separator)
        splits = _split_text_with_regex(
            text, separator_, keep_separator=self._keep_separator
        )

        # Now go merging things, recursively splitting longer texts.
        good_splits = []
        separator_ = "" if self._keep_separator else separator
        for s in splits:
            if self._length_function(s) < self._chunk_size:
                good_splits.append(s)
            else:
                if good_splits:
                    merged_text = self._merge_splits(good_splits, separator_)
                    final_chunks.extend(merged_text)
                    good_splits = []
                if not new_separators:
                    final_chunks.append(s)
                else:
                    other_info = self._split_text(s, new_separators)
                    final_chunks.extend(other_info)
        if good_splits:
            merged_text = self._merge_splits(good_splits, separator_)
            final_chunks.extend(merged_text)
        return final_chunks

    def split_text(self, text: str) -> list[str]:
        """Split the input text into smaller chunks based on predefined separators.

        Args:
            text: The input text to be split.

        Returns:
            A list of text chunks obtained after splitting.
        """
        return self._split_text(text, self._separators)

    @classmethod
    def from_language(
        cls, language: Language, **kwargs: Any
    ) -> RecursiveCharacterTextSplitter:
        """Return an instance of this class based on a specific language.

        This method initializes the text splitter with language-specific separators.

        Args:

Frequently Asked Questions

What is the RecursiveCharacterTextSplitter class?
RecursiveCharacterTextSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/character.py.
Where is RecursiveCharacterTextSplitter defined?
RecursiveCharacterTextSplitter is defined in libs/text-splitters/langchain_text_splitters/character.py at line 88.
What does RecursiveCharacterTextSplitter extend?
RecursiveCharacterTextSplitter extends TextSplitter, Language.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free