Home / Function/ count_tokens_approximately() — langchain Function Reference

count_tokens_approximately() — langchain Function Reference

Architecture documentation for the count_tokens_approximately() function in utils.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11["count_tokens_approximately()"]
  0b528c80_0ce7_1c74_8932_bc433bcb03c6["utils.py"]
  0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 -->|defined in| 0b528c80_0ce7_1c74_8932_bc433bcb03c6
  18c39895_f266_1937_05dc_c90bd6becb90["_approximate_token_counter()"]
  18c39895_f266_1937_05dc_c90bd6becb90 -->|calls| 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11
  7d8fdbaf_a57f_bad7_f47e_85c3fa1f78fe["convert_to_messages()"]
  0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 -->|calls| 7d8fdbaf_a57f_bad7_f47e_85c3fa1f78fe
  ea68ecf6_3931_f79c_5745_e75911c6e99a["_get_message_openai_role()"]
  0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 -->|calls| ea68ecf6_3931_f79c_5745_e75911c6e99a
  style 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/core/langchain_core/messages/utils.py lines 2186–2342

def count_tokens_approximately(
    messages: Iterable[MessageLikeRepresentation],
    *,
    chars_per_token: float = 4.0,
    extra_tokens_per_message: float = 3.0,
    count_name: bool = True,
    tokens_per_image: int = 85,
    use_usage_metadata_scaling: bool = False,
    tools: list[BaseTool | dict[str, Any]] | None = None,
) -> int:
    """Approximate the total number of tokens in messages.

    The token count includes stringified message content, role, and (optionally) name.

    - For AI messages, the token count also includes stringified tool calls.
    - For tool messages, the token count also includes the tool call ID.
    - For multimodal messages with images, applies a fixed token penalty per image
      instead of counting base64-encoded characters.
    - If tools are provided, the token count also includes stringified tool schemas.

    Args:
        messages: List of messages to count tokens for.
        chars_per_token: Number of characters per token to use for the approximation.
            One token corresponds to ~4 chars for common English text.
            You can also specify `float` values for more fine-grained control.
            [See more here](https://platform.openai.com/tokenizer).
        extra_tokens_per_message: Number of extra tokens to add per message, e.g.
            special tokens, including beginning/end of message.
            You can also specify `float` values for more fine-grained control.
            [See more here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb).
        count_name: Whether to include message names in the count.
        tokens_per_image: Fixed token cost per image (default: 85, aligned with
            OpenAI's low-resolution image token cost).
        use_usage_metadata_scaling: If True, and all AI messages have consistent
            `response_metadata['model_provider']`, scale the approximate token count
            using the **most recent** AI message that has
            `usage_metadata['total_tokens']`. The scaling factor is:
            `AI_total_tokens / approx_tokens_up_to_that_AI_message`
        tools: List of tools to include in the token count. Each tool can be either
            a `BaseTool` instance or a dict representing a tool schema. `BaseTool`
            instances are converted to OpenAI tool format before counting.

    Returns:
        Approximate number of tokens in the messages (and tools, if provided).

    Note:
        This is a simple approximation that may not match the exact token count used by
        specific models. For accurate counts, use model-specific tokenizers.

        For multimodal messages containing images, a fixed token penalty is applied
        per image instead of counting base64-encoded characters, which provides a
        more realistic approximation.

    !!! version-added "Added in `langchain-core` 0.3.46"
    """
    converted_messages = convert_to_messages(messages)

    token_count = 0.0

    ai_model_provider: str | None = None
    invalid_model_provider = False
    last_ai_total_tokens: int | None = None
    approx_at_last_ai: float | None = None

    # Count tokens for tools if provided
    if tools:
        tools_chars = 0
        for tool in tools:
            tool_dict = tool if isinstance(tool, dict) else convert_to_openai_tool(tool)
            tools_chars += len(json.dumps(tool_dict))
        token_count += math.ceil(tools_chars / chars_per_token)

    for message in converted_messages:
        message_chars = 0

        if isinstance(message.content, str):
            message_chars += len(message.content)
        # Handle multimodal content (list of content blocks)
        elif isinstance(message.content, list):
            for block in message.content:
                if isinstance(block, str):

Subdomains

Frequently Asked Questions

What does count_tokens_approximately() do?
count_tokens_approximately() is a function in the langchain codebase, defined in libs/core/langchain_core/messages/utils.py.
Where is count_tokens_approximately() defined?
count_tokens_approximately() is defined in libs/core/langchain_core/messages/utils.py at line 2186.
What does count_tokens_approximately() call?
count_tokens_approximately() calls 2 function(s): _get_message_openai_role, convert_to_messages.
What calls count_tokens_approximately()?
count_tokens_approximately() is called by 1 function(s): _approximate_token_counter.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free