CharacterTextSplitter Class — langchain Architecture
Architecture documentation for the CharacterTextSplitter class in character.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 70b3caa4_8308_371e_5891_177bf03efb36["CharacterTextSplitter"] c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"] 70b3caa4_8308_371e_5891_177bf03efb36 -->|extends| c86e37d5_f962_cc1e_9821_b665e1359ae8 2928a4a1_9408_cbea_fa7c_7f66eab697a2["character.py"] 70b3caa4_8308_371e_5891_177bf03efb36 -->|defined in| 2928a4a1_9408_cbea_fa7c_7f66eab697a2 2beeabad_f514_0833_2383_858a431f7f38["__init__()"] 70b3caa4_8308_371e_5891_177bf03efb36 -->|method| 2beeabad_f514_0833_2383_858a431f7f38 11fcdf48_fab6_27e3_ca6c_8904e5348e23["split_text()"] 70b3caa4_8308_371e_5891_177bf03efb36 -->|method| 11fcdf48_fab6_27e3_ca6c_8904e5348e23
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/character.py lines 11–58
class CharacterTextSplitter(TextSplitter):
"""Splitting text that looks at characters."""
def __init__(
self,
separator: str = "\n\n",
is_separator_regex: bool = False, # noqa: FBT001,FBT002
**kwargs: Any,
) -> None:
"""Create a new TextSplitter."""
super().__init__(**kwargs)
self._separator = separator
self._is_separator_regex = is_separator_regex
def split_text(self, text: str) -> list[str]:
"""Split into chunks without re-inserting lookaround separators.
Args:
text: The text to split.
Returns:
A list of text chunks.
"""
# 1. Determine split pattern: raw regex or escaped literal
sep_pattern = (
self._separator if self._is_separator_regex else re.escape(self._separator)
)
# 2. Initial split (keep separator if requested)
splits = _split_text_with_regex(
text, sep_pattern, keep_separator=self._keep_separator
)
# 3. Detect zero-width lookaround so we never re-insert it
lookaround_prefixes = ("(?=", "(?<!", "(?<=", "(?!")
is_lookaround = self._is_separator_regex and any(
self._separator.startswith(p) for p in lookaround_prefixes
)
# 4. Decide merge separator:
# - if keep_separator or lookaround -> don't re-insert
# - else -> re-insert literal separator
merge_sep = ""
if not (self._keep_separator or is_lookaround):
merge_sep = self._separator
# 5. Merge adjacent splits and return
return self._merge_splits(splits, merge_sep)
Extends
Source
Frequently Asked Questions
What is the CharacterTextSplitter class?
CharacterTextSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/character.py.
Where is CharacterTextSplitter defined?
CharacterTextSplitter is defined in libs/text-splitters/langchain_text_splitters/character.py at line 11.
What does CharacterTextSplitter extend?
CharacterTextSplitter extends TextSplitter.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free