from_huggingface_tokenizer() — langchain Function Reference
Architecture documentation for the from_huggingface_tokenizer() function in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD e3cb5fd5_0149_e230_e0f1_05d16edbd1ed["from_huggingface_tokenizer()"] c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"] e3cb5fd5_0149_e230_e0f1_05d16edbd1ed -->|defined in| c86e37d5_f962_cc1e_9821_b665e1359ae8 style e3cb5fd5_0149_e230_e0f1_05d16edbd1ed fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/base.py lines 197–223
def from_huggingface_tokenizer(
cls, tokenizer: PreTrainedTokenizerBase, **kwargs: Any
) -> TextSplitter:
"""Text splitter that uses Hugging Face tokenizer to count length.
Args:
tokenizer: The Hugging Face tokenizer to use.
Returns:
An instance of `TextSplitter` using the Hugging Face tokenizer for length
calculation.
"""
if not _HAS_TRANSFORMERS:
msg = (
"Could not import transformers python package. "
"Please install it with `pip install transformers`."
)
raise ValueError(msg)
if not isinstance(tokenizer, PreTrainedTokenizerBase):
msg = "Tokenizer received was not an instance of PreTrainedTokenizerBase" # type: ignore[unreachable]
raise ValueError(msg) # noqa: TRY004
def _huggingface_tokenizer_length(text: str) -> int:
return len(tokenizer.tokenize(text))
return cls(length_function=_huggingface_tokenizer_length, **kwargs)
Domain
Subdomains
Source
Frequently Asked Questions
What does from_huggingface_tokenizer() do?
from_huggingface_tokenizer() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is from_huggingface_tokenizer() defined?
from_huggingface_tokenizer() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 197.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free