split_text_on_tokens() — langchain Function Reference
Architecture documentation for the split_text_on_tokens() function in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 0f51bcb8_84bd_5648_dc1d_650381b1e32d["split_text_on_tokens()"] d96ff4b9_fcc1_8428_729e_f75b099397b4["base.py"] 0f51bcb8_84bd_5648_dc1d_650381b1e32d -->|defined in| d96ff4b9_fcc1_8428_729e_f75b099397b4 b20de6d0_e7f4_4423_1863_b2f88e0d3c76["split_text()"] b20de6d0_e7f4_4423_1863_b2f88e0d3c76 -->|calls| 0f51bcb8_84bd_5648_dc1d_650381b1e32d style 0f51bcb8_84bd_5648_dc1d_650381b1e32d fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/base.py lines 422–450
def split_text_on_tokens(*, text: str, tokenizer: Tokenizer) -> list[str]:
"""Split incoming text and return chunks using tokenizer.
Args:
text: The input text to be split.
tokenizer: The tokenizer to use for splitting.
Returns:
A list of text chunks.
"""
splits: list[str] = []
input_ids = tokenizer.encode(text)
start_idx = 0
if tokenizer.tokens_per_chunk <= tokenizer.chunk_overlap:
msg = "tokens_per_chunk must be greater than chunk_overlap"
raise ValueError(msg)
while start_idx < len(input_ids):
cur_idx = min(start_idx + tokenizer.tokens_per_chunk, len(input_ids))
chunk_ids = input_ids[start_idx:cur_idx]
if not chunk_ids:
break
decoded = tokenizer.decode(chunk_ids)
if decoded:
splits.append(decoded)
if cur_idx == len(input_ids):
break
start_idx += tokenizer.tokens_per_chunk - tokenizer.chunk_overlap
return splits
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does split_text_on_tokens() do?
split_text_on_tokens() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is split_text_on_tokens() defined?
split_text_on_tokens() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 422.
What calls split_text_on_tokens()?
split_text_on_tokens() is called by 1 function(s): split_text.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free