Home / Function/ from_tiktoken_encoder() — langchain Function Reference

from_tiktoken_encoder() — langchain Function Reference

Architecture documentation for the from_tiktoken_encoder() function in base.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  1eee98c3_ae63_e2ab_49ab_943e1721d020["from_tiktoken_encoder()"]
  c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"]
  1eee98c3_ae63_e2ab_49ab_943e1721d020 -->|defined in| c86e37d5_f962_cc1e_9821_b665e1359ae8
  style 1eee98c3_ae63_e2ab_49ab_943e1721d020 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/base.py lines 226–281

    def from_tiktoken_encoder(
        cls,
        encoding_name: str = "gpt2",
        model_name: str | None = None,
        allowed_special: Literal["all"] | AbstractSet[str] = set(),
        disallowed_special: Literal["all"] | Collection[str] = "all",
        **kwargs: Any,
    ) -> Self:
        """Text splitter that uses `tiktoken` encoder to count length.

        Args:
            encoding_name: The name of the tiktoken encoding to use.
            model_name: The name of the model to use.

                If provided, this will override the `encoding_name`.
            allowed_special: Special tokens that are allowed during encoding.
            disallowed_special: Special tokens that are disallowed during encoding.

        Returns:
            An instance of `TextSplitter` using tiktoken for length calculation.

        Raises:
            ImportError: If the tiktoken package is not installed.
        """
        if not _HAS_TIKTOKEN:
            msg = (
                "Could not import tiktoken python package. "
                "This is needed in order to calculate max_tokens_for_prompt. "
                "Please install it with `pip install tiktoken`."
            )
            raise ImportError(msg)

        if model_name is not None:
            enc = tiktoken.encoding_for_model(model_name)
        else:
            enc = tiktoken.get_encoding(encoding_name)

        def _tiktoken_encoder(text: str) -> int:
            return len(
                enc.encode(
                    text,
                    allowed_special=allowed_special,
                    disallowed_special=disallowed_special,
                )
            )

        if issubclass(cls, TokenTextSplitter):
            extra_kwargs = {
                "encoding_name": encoding_name,
                "model_name": model_name,
                "allowed_special": allowed_special,
                "disallowed_special": disallowed_special,
            }
            kwargs = {**kwargs, **extra_kwargs}

        return cls(length_function=_tiktoken_encoder, **kwargs)

Subdomains

Frequently Asked Questions

What does from_tiktoken_encoder() do?
from_tiktoken_encoder() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is from_tiktoken_encoder() defined?
from_tiktoken_encoder() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 226.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free