RecursiveCharacterTextSplitter Class — langchain Architecture
Architecture documentation for the RecursiveCharacterTextSplitter class in character.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8["RecursiveCharacterTextSplitter"] c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|extends| c86e37d5_f962_cc1e_9821_b665e1359ae8 27cf1dbb_403e_cf43_5525_c1cbd82ba6cb["Language"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|extends| 27cf1dbb_403e_cf43_5525_c1cbd82ba6cb 2928a4a1_9408_cbea_fa7c_7f66eab697a2["character.py"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|defined in| 2928a4a1_9408_cbea_fa7c_7f66eab697a2 73b28b45_ccac_996b_802d_26962fef2462["__init__()"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 73b28b45_ccac_996b_802d_26962fef2462 90247d99_f9d1_5357_ea60_e7b8e740431f["_split_text()"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 90247d99_f9d1_5357_ea60_e7b8e740431f cdc32315_d799_46f6_bd91_09d4da023d15["split_text()"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| cdc32315_d799_46f6_bd91_09d4da023d15 72753d58_c67a_9a23_1006_2861c5bf1d1a["from_language()"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 72753d58_c67a_9a23_1006_2861c5bf1d1a 38b02c9d_fd32_4960_73f8_de1c2c0d0827["get_separators_for_language()"] 22d8d30b_9b36_1532_bb1c_4c9aa03a4bb8 -->|method| 38b02c9d_fd32_4960_73f8_de1c2c0d0827
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/character.py lines 88–803
class RecursiveCharacterTextSplitter(TextSplitter):
"""Splitting text by recursively look at characters.
Recursively tries to split by different characters to find one
that works.
"""
def __init__(
self,
separators: list[str] | None = None,
keep_separator: bool | Literal["start", "end"] = True, # noqa: FBT001,FBT002
is_separator_regex: bool = False, # noqa: FBT001,FBT002
**kwargs: Any,
) -> None:
"""Create a new TextSplitter."""
super().__init__(keep_separator=keep_separator, **kwargs)
self._separators = separators or ["\n\n", "\n", " ", ""]
self._is_separator_regex = is_separator_regex
def _split_text(self, text: str, separators: list[str]) -> list[str]:
"""Split incoming text and return chunks."""
final_chunks = []
# Get appropriate separator to use
separator = separators[-1]
new_separators = []
for i, s_ in enumerate(separators):
separator_ = s_ if self._is_separator_regex else re.escape(s_)
if not s_:
separator = s_
break
if re.search(separator_, text):
separator = s_
new_separators = separators[i + 1 :]
break
separator_ = separator if self._is_separator_regex else re.escape(separator)
splits = _split_text_with_regex(
text, separator_, keep_separator=self._keep_separator
)
# Now go merging things, recursively splitting longer texts.
good_splits = []
separator_ = "" if self._keep_separator else separator
for s in splits:
if self._length_function(s) < self._chunk_size:
good_splits.append(s)
else:
if good_splits:
merged_text = self._merge_splits(good_splits, separator_)
final_chunks.extend(merged_text)
good_splits = []
if not new_separators:
final_chunks.append(s)
else:
other_info = self._split_text(s, new_separators)
final_chunks.extend(other_info)
if good_splits:
merged_text = self._merge_splits(good_splits, separator_)
final_chunks.extend(merged_text)
return final_chunks
def split_text(self, text: str) -> list[str]:
"""Split the input text into smaller chunks based on predefined separators.
Args:
text: The input text to be split.
Returns:
A list of text chunks obtained after splitting.
"""
return self._split_text(text, self._separators)
@classmethod
def from_language(
cls, language: Language, **kwargs: Any
) -> RecursiveCharacterTextSplitter:
"""Return an instance of this class based on a specific language.
This method initializes the text splitter with language-specific separators.
Args:
Extends
Source
Frequently Asked Questions
What is the RecursiveCharacterTextSplitter class?
RecursiveCharacterTextSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/character.py.
Where is RecursiveCharacterTextSplitter defined?
RecursiveCharacterTextSplitter is defined in libs/text-splitters/langchain_text_splitters/character.py at line 88.
What does RecursiveCharacterTextSplitter extend?
RecursiveCharacterTextSplitter extends TextSplitter, Language.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free