Home / Class/ CharacterTextSplitter Class — langchain Architecture

CharacterTextSplitter Class — langchain Architecture

Architecture documentation for the CharacterTextSplitter class in character.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  70b3caa4_8308_371e_5891_177bf03efb36["CharacterTextSplitter"]
  c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"]
  70b3caa4_8308_371e_5891_177bf03efb36 -->|extends| c86e37d5_f962_cc1e_9821_b665e1359ae8
  2928a4a1_9408_cbea_fa7c_7f66eab697a2["character.py"]
  70b3caa4_8308_371e_5891_177bf03efb36 -->|defined in| 2928a4a1_9408_cbea_fa7c_7f66eab697a2
  2beeabad_f514_0833_2383_858a431f7f38["__init__()"]
  70b3caa4_8308_371e_5891_177bf03efb36 -->|method| 2beeabad_f514_0833_2383_858a431f7f38
  11fcdf48_fab6_27e3_ca6c_8904e5348e23["split_text()"]
  70b3caa4_8308_371e_5891_177bf03efb36 -->|method| 11fcdf48_fab6_27e3_ca6c_8904e5348e23

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/character.py lines 11–58

class CharacterTextSplitter(TextSplitter):
    """Splitting text that looks at characters."""

    def __init__(
        self,
        separator: str = "\n\n",
        is_separator_regex: bool = False,  # noqa: FBT001,FBT002
        **kwargs: Any,
    ) -> None:
        """Create a new TextSplitter."""
        super().__init__(**kwargs)
        self._separator = separator
        self._is_separator_regex = is_separator_regex

    def split_text(self, text: str) -> list[str]:
        """Split into chunks without re-inserting lookaround separators.

        Args:
            text: The text to split.

        Returns:
            A list of text chunks.
        """
        # 1. Determine split pattern: raw regex or escaped literal
        sep_pattern = (
            self._separator if self._is_separator_regex else re.escape(self._separator)
        )

        # 2. Initial split (keep separator if requested)
        splits = _split_text_with_regex(
            text, sep_pattern, keep_separator=self._keep_separator
        )

        # 3. Detect zero-width lookaround so we never re-insert it
        lookaround_prefixes = ("(?=", "(?<!", "(?<=", "(?!")
        is_lookaround = self._is_separator_regex and any(
            self._separator.startswith(p) for p in lookaround_prefixes
        )

        # 4. Decide merge separator:
        #    - if keep_separator or lookaround -> don't re-insert
        #    - else -> re-insert literal separator
        merge_sep = ""
        if not (self._keep_separator or is_lookaround):
            merge_sep = self._separator

        # 5. Merge adjacent splits and return
        return self._merge_splits(splits, merge_sep)

Extends

Frequently Asked Questions

What is the CharacterTextSplitter class?
CharacterTextSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/character.py.
Where is CharacterTextSplitter defined?
CharacterTextSplitter is defined in libs/text-splitters/langchain_text_splitters/character.py at line 11.
What does CharacterTextSplitter extend?
CharacterTextSplitter extends TextSplitter.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free