Home / Function/ split_text() — langchain Function Reference

split_text() — langchain Function Reference

Architecture documentation for the split_text() function in base.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  b20de6d0_e7f4_4423_1863_b2f88e0d3c76["split_text()"]
  fee5f91c_52d7_4d25_94a2_c45ac6b35d65["TokenTextSplitter"]
  b20de6d0_e7f4_4423_1863_b2f88e0d3c76 -->|defined in| fee5f91c_52d7_4d25_94a2_c45ac6b35d65
  01cef059_4479_0a04_53ff_2c366fd5c5bf["split_text()"]
  01cef059_4479_0a04_53ff_2c366fd5c5bf -->|calls| b20de6d0_e7f4_4423_1863_b2f88e0d3c76
  01cef059_4479_0a04_53ff_2c366fd5c5bf["split_text()"]
  b20de6d0_e7f4_4423_1863_b2f88e0d3c76 -->|calls| 01cef059_4479_0a04_53ff_2c366fd5c5bf
  0f51bcb8_84bd_5648_dc1d_650381b1e32d["split_text_on_tokens()"]
  b20de6d0_e7f4_4423_1863_b2f88e0d3c76 -->|calls| 0f51bcb8_84bd_5648_dc1d_650381b1e32d
  style b20de6d0_e7f4_4423_1863_b2f88e0d3c76 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/base.py lines 339–369

    def split_text(self, text: str) -> list[str]:
        """Splits the input text into smaller chunks based on tokenization.

        This method uses a custom tokenizer configuration to encode the input text
        into tokens, processes the tokens in chunks of a specified size with overlap,
        and decodes them back into text chunks. The splitting is performed using the
        `split_text_on_tokens` function.

        Args:
            text: The input text to be split into smaller chunks.

        Returns:
            A list of text chunks, where each chunk is derived from a portion
                of the input text based on the tokenization and chunking rules.
        """

        def _encode(_text: str) -> list[int]:
            return self._tokenizer.encode(
                _text,
                allowed_special=self._allowed_special,
                disallowed_special=self._disallowed_special,
            )

        tokenizer = Tokenizer(
            chunk_overlap=self._chunk_overlap,
            tokens_per_chunk=self._chunk_size,
            decode=self._tokenizer.decode,
            encode=_encode,
        )

        return split_text_on_tokens(text=text, tokenizer=tokenizer)

Subdomains

Called By

Frequently Asked Questions

What does split_text() do?
split_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is split_text() defined?
split_text() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 339.
What does split_text() call?
split_text() calls 2 function(s): split_text, split_text_on_tokens.
What calls split_text()?
split_text() is called by 1 function(s): split_text.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free