Home / Function/ split_text() — langchain Function Reference

split_text() — langchain Function Reference

Architecture documentation for the split_text() function in character.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  11fcdf48_fab6_27e3_ca6c_8904e5348e23["split_text()"]
  70b3caa4_8308_371e_5891_177bf03efb36["CharacterTextSplitter"]
  11fcdf48_fab6_27e3_ca6c_8904e5348e23 -->|defined in| 70b3caa4_8308_371e_5891_177bf03efb36
  cdc32315_d799_46f6_bd91_09d4da023d15["split_text()"]
  cdc32315_d799_46f6_bd91_09d4da023d15 -->|calls| 11fcdf48_fab6_27e3_ca6c_8904e5348e23
  cdc32315_d799_46f6_bd91_09d4da023d15["split_text()"]
  11fcdf48_fab6_27e3_ca6c_8904e5348e23 -->|calls| cdc32315_d799_46f6_bd91_09d4da023d15
  4df0af83_3c38_0acb_3015_c017f66de0cc["_split_text_with_regex()"]
  11fcdf48_fab6_27e3_ca6c_8904e5348e23 -->|calls| 4df0af83_3c38_0acb_3015_c017f66de0cc
  style 11fcdf48_fab6_27e3_ca6c_8904e5348e23 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/character.py lines 25–58

    def split_text(self, text: str) -> list[str]:
        """Split into chunks without re-inserting lookaround separators.

        Args:
            text: The text to split.

        Returns:
            A list of text chunks.
        """
        # 1. Determine split pattern: raw regex or escaped literal
        sep_pattern = (
            self._separator if self._is_separator_regex else re.escape(self._separator)
        )

        # 2. Initial split (keep separator if requested)
        splits = _split_text_with_regex(
            text, sep_pattern, keep_separator=self._keep_separator
        )

        # 3. Detect zero-width lookaround so we never re-insert it
        lookaround_prefixes = ("(?=", "(?<!", "(?<=", "(?!")
        is_lookaround = self._is_separator_regex and any(
            self._separator.startswith(p) for p in lookaround_prefixes
        )

        # 4. Decide merge separator:
        #    - if keep_separator or lookaround -> don't re-insert
        #    - else -> re-insert literal separator
        merge_sep = ""
        if not (self._keep_separator or is_lookaround):
            merge_sep = self._separator

        # 5. Merge adjacent splits and return
        return self._merge_splits(splits, merge_sep)

Subdomains

Called By

Frequently Asked Questions

What does split_text() do?
split_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/character.py.
Where is split_text() defined?
split_text() is defined in libs/text-splitters/langchain_text_splitters/character.py at line 25.
What does split_text() call?
split_text() calls 2 function(s): _split_text_with_regex, split_text.
What calls split_text()?
split_text() is called by 1 function(s): split_text.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free