get_tokenizer() — langchain Function Reference

Architecture documentation for the get_tokenizer() function in base.py from the langchain codebase.

Function python CoreAbstractions Serialization called by 1

Entity Profile

CoreAbstractions→ Serialization→ get_tokenizer() — langchain Function Reference

Dependency Diagram

graph TD
  2c7148dd_77a5_4fa6_bc1e_90356a54ef84["get_tokenizer()"]
  d2346df7_0af9_9808_4af4_3dbe3daa01f5["base.py"]
  2c7148dd_77a5_4fa6_bc1e_90356a54ef84 -->|defined in| d2346df7_0af9_9808_4af4_3dbe3daa01f5
  410c33f4_c7e9_41dd_70a0_8995803f3819["_get_token_ids_default_method()"]
  410c33f4_c7e9_41dd_70a0_8995803f3819 -->|calls| 2c7148dd_77a5_4fa6_bc1e_90356a54ef84
  style 2c7148dd_77a5_4fa6_bc1e_90356a54ef84 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/core/langchain_core/language_models/base.py lines 75–95

def get_tokenizer() -> Any:
    """Get a GPT-2 tokenizer instance.

    This function is cached to avoid re-loading the tokenizer every time it is called.

    Raises:
        ImportError: If the transformers package is not installed.

    Returns:
        The GPT-2 tokenizer instance.

    """
    if not _HAS_TRANSFORMERS:
        msg = (
            "Could not import transformers python package. "
            "This is needed in order to calculate get_token_ids. "
            "Please install it with `pip install transformers`."
        )
        raise ImportError(msg)
    # create a GPT-2 tokenizer instance
    return GPT2TokenizerFast.from_pretrained("gpt2")