Home / Function/ _get_token_ids_default_method() — langchain Function Reference

_get_token_ids_default_method() — langchain Function Reference

Architecture documentation for the _get_token_ids_default_method() function in base.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  410c33f4_c7e9_41dd_70a0_8995803f3819["_get_token_ids_default_method()"]
  d2346df7_0af9_9808_4af4_3dbe3daa01f5["base.py"]
  410c33f4_c7e9_41dd_70a0_8995803f3819 -->|defined in| d2346df7_0af9_9808_4af4_3dbe3daa01f5
  75233eb7_d305_3f28_66ab_34eea23f581a["get_token_ids()"]
  75233eb7_d305_3f28_66ab_34eea23f581a -->|calls| 410c33f4_c7e9_41dd_70a0_8995803f3819
  2c7148dd_77a5_4fa6_bc1e_90356a54ef84["get_tokenizer()"]
  410c33f4_c7e9_41dd_70a0_8995803f3819 -->|calls| 2c7148dd_77a5_4fa6_bc1e_90356a54ef84
  style 410c33f4_c7e9_41dd_70a0_8995803f3819 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/core/langchain_core/language_models/base.py lines 101–119

def _get_token_ids_default_method(text: str) -> list[int]:
    """Encode the text into token IDs using the fallback GPT-2 tokenizer."""
    global _GPT2_TOKENIZER_WARNED  # noqa: PLW0603
    if not _GPT2_TOKENIZER_WARNED:
        warnings.warn(
            "Using fallback GPT-2 tokenizer for token counting. "
            "Token counts may be inaccurate for non-GPT-2 models. "
            "For accurate counts, use a model-specific method if available.",
            stacklevel=3,
        )
        _GPT2_TOKENIZER_WARNED = True

    tokenizer = get_tokenizer()

    # Pass verbose=False to suppress the "Token indices sequence length is longer than
    # the specified maximum sequence length" warning from HuggingFace. This warning is
    # about GPT-2's 1024 token context limit, but we're only using the tokenizer for
    # counting, not for model input.
    return cast("list[int]", tokenizer.encode(text, verbose=False))

Subdomains

Called By

Frequently Asked Questions

What does _get_token_ids_default_method() do?
_get_token_ids_default_method() is a function in the langchain codebase, defined in libs/core/langchain_core/language_models/base.py.
Where is _get_token_ids_default_method() defined?
_get_token_ids_default_method() is defined in libs/core/langchain_core/language_models/base.py at line 101.
What does _get_token_ids_default_method() call?
_get_token_ids_default_method() calls 1 function(s): get_tokenizer.
What calls _get_token_ids_default_method()?
_get_token_ids_default_method() is called by 1 function(s): get_token_ids.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free