from_tiktoken_encoder() — langchain Function Reference
Architecture documentation for the from_tiktoken_encoder() function in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 1eee98c3_ae63_e2ab_49ab_943e1721d020["from_tiktoken_encoder()"] c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"] 1eee98c3_ae63_e2ab_49ab_943e1721d020 -->|defined in| c86e37d5_f962_cc1e_9821_b665e1359ae8 style 1eee98c3_ae63_e2ab_49ab_943e1721d020 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/base.py lines 226–281
def from_tiktoken_encoder(
cls,
encoding_name: str = "gpt2",
model_name: str | None = None,
allowed_special: Literal["all"] | AbstractSet[str] = set(),
disallowed_special: Literal["all"] | Collection[str] = "all",
**kwargs: Any,
) -> Self:
"""Text splitter that uses `tiktoken` encoder to count length.
Args:
encoding_name: The name of the tiktoken encoding to use.
model_name: The name of the model to use.
If provided, this will override the `encoding_name`.
allowed_special: Special tokens that are allowed during encoding.
disallowed_special: Special tokens that are disallowed during encoding.
Returns:
An instance of `TextSplitter` using tiktoken for length calculation.
Raises:
ImportError: If the tiktoken package is not installed.
"""
if not _HAS_TIKTOKEN:
msg = (
"Could not import tiktoken python package. "
"This is needed in order to calculate max_tokens_for_prompt. "
"Please install it with `pip install tiktoken`."
)
raise ImportError(msg)
if model_name is not None:
enc = tiktoken.encoding_for_model(model_name)
else:
enc = tiktoken.get_encoding(encoding_name)
def _tiktoken_encoder(text: str) -> int:
return len(
enc.encode(
text,
allowed_special=allowed_special,
disallowed_special=disallowed_special,
)
)
if issubclass(cls, TokenTextSplitter):
extra_kwargs = {
"encoding_name": encoding_name,
"model_name": model_name,
"allowed_special": allowed_special,
"disallowed_special": disallowed_special,
}
kwargs = {**kwargs, **extra_kwargs}
return cls(length_function=_tiktoken_encoder, **kwargs)
Domain
Subdomains
Source
Frequently Asked Questions
What does from_tiktoken_encoder() do?
from_tiktoken_encoder() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is from_tiktoken_encoder() defined?
from_tiktoken_encoder() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 226.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free