Home / Function/ from_huggingface_tokenizer() — langchain Function Reference

from_huggingface_tokenizer() — langchain Function Reference

Architecture documentation for the from_huggingface_tokenizer() function in base.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  e3cb5fd5_0149_e230_e0f1_05d16edbd1ed["from_huggingface_tokenizer()"]
  c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"]
  e3cb5fd5_0149_e230_e0f1_05d16edbd1ed -->|defined in| c86e37d5_f962_cc1e_9821_b665e1359ae8
  style e3cb5fd5_0149_e230_e0f1_05d16edbd1ed fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/base.py lines 197–223

    def from_huggingface_tokenizer(
        cls, tokenizer: PreTrainedTokenizerBase, **kwargs: Any
    ) -> TextSplitter:
        """Text splitter that uses Hugging Face tokenizer to count length.

        Args:
            tokenizer: The Hugging Face tokenizer to use.

        Returns:
            An instance of `TextSplitter` using the Hugging Face tokenizer for length
                calculation.
        """
        if not _HAS_TRANSFORMERS:
            msg = (
                "Could not import transformers python package. "
                "Please install it with `pip install transformers`."
            )
            raise ValueError(msg)

        if not isinstance(tokenizer, PreTrainedTokenizerBase):
            msg = "Tokenizer received was not an instance of PreTrainedTokenizerBase"  # type: ignore[unreachable]
            raise ValueError(msg)  # noqa: TRY004

        def _huggingface_tokenizer_length(text: str) -> int:
            return len(tokenizer.tokenize(text))

        return cls(length_function=_huggingface_tokenizer_length, **kwargs)

Subdomains

Frequently Asked Questions

What does from_huggingface_tokenizer() do?
from_huggingface_tokenizer() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is from_huggingface_tokenizer() defined?
from_huggingface_tokenizer() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 197.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free