count_tokens_approximately() — langchain Function Reference
Architecture documentation for the count_tokens_approximately() function in utils.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11["count_tokens_approximately()"] 0b528c80_0ce7_1c74_8932_bc433bcb03c6["utils.py"] 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 -->|defined in| 0b528c80_0ce7_1c74_8932_bc433bcb03c6 18c39895_f266_1937_05dc_c90bd6becb90["_approximate_token_counter()"] 18c39895_f266_1937_05dc_c90bd6becb90 -->|calls| 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 7d8fdbaf_a57f_bad7_f47e_85c3fa1f78fe["convert_to_messages()"] 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 -->|calls| 7d8fdbaf_a57f_bad7_f47e_85c3fa1f78fe ea68ecf6_3931_f79c_5745_e75911c6e99a["_get_message_openai_role()"] 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 -->|calls| ea68ecf6_3931_f79c_5745_e75911c6e99a style 0a2d5a3a_f0de_5ab6_561c_6ebc3475ad11 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/core/langchain_core/messages/utils.py lines 2186–2342
def count_tokens_approximately(
messages: Iterable[MessageLikeRepresentation],
*,
chars_per_token: float = 4.0,
extra_tokens_per_message: float = 3.0,
count_name: bool = True,
tokens_per_image: int = 85,
use_usage_metadata_scaling: bool = False,
tools: list[BaseTool | dict[str, Any]] | None = None,
) -> int:
"""Approximate the total number of tokens in messages.
The token count includes stringified message content, role, and (optionally) name.
- For AI messages, the token count also includes stringified tool calls.
- For tool messages, the token count also includes the tool call ID.
- For multimodal messages with images, applies a fixed token penalty per image
instead of counting base64-encoded characters.
- If tools are provided, the token count also includes stringified tool schemas.
Args:
messages: List of messages to count tokens for.
chars_per_token: Number of characters per token to use for the approximation.
One token corresponds to ~4 chars for common English text.
You can also specify `float` values for more fine-grained control.
[See more here](https://platform.openai.com/tokenizer).
extra_tokens_per_message: Number of extra tokens to add per message, e.g.
special tokens, including beginning/end of message.
You can also specify `float` values for more fine-grained control.
[See more here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb).
count_name: Whether to include message names in the count.
tokens_per_image: Fixed token cost per image (default: 85, aligned with
OpenAI's low-resolution image token cost).
use_usage_metadata_scaling: If True, and all AI messages have consistent
`response_metadata['model_provider']`, scale the approximate token count
using the **most recent** AI message that has
`usage_metadata['total_tokens']`. The scaling factor is:
`AI_total_tokens / approx_tokens_up_to_that_AI_message`
tools: List of tools to include in the token count. Each tool can be either
a `BaseTool` instance or a dict representing a tool schema. `BaseTool`
instances are converted to OpenAI tool format before counting.
Returns:
Approximate number of tokens in the messages (and tools, if provided).
Note:
This is a simple approximation that may not match the exact token count used by
specific models. For accurate counts, use model-specific tokenizers.
For multimodal messages containing images, a fixed token penalty is applied
per image instead of counting base64-encoded characters, which provides a
more realistic approximation.
!!! version-added "Added in `langchain-core` 0.3.46"
"""
converted_messages = convert_to_messages(messages)
token_count = 0.0
ai_model_provider: str | None = None
invalid_model_provider = False
last_ai_total_tokens: int | None = None
approx_at_last_ai: float | None = None
# Count tokens for tools if provided
if tools:
tools_chars = 0
for tool in tools:
tool_dict = tool if isinstance(tool, dict) else convert_to_openai_tool(tool)
tools_chars += len(json.dumps(tool_dict))
token_count += math.ceil(tools_chars / chars_per_token)
for message in converted_messages:
message_chars = 0
if isinstance(message.content, str):
message_chars += len(message.content)
# Handle multimodal content (list of content blocks)
elif isinstance(message.content, list):
for block in message.content:
if isinstance(block, str):
Domain
Subdomains
Defined In
Called By
Source
Frequently Asked Questions
What does count_tokens_approximately() do?
count_tokens_approximately() is a function in the langchain codebase, defined in libs/core/langchain_core/messages/utils.py.
Where is count_tokens_approximately() defined?
count_tokens_approximately() is defined in libs/core/langchain_core/messages/utils.py at line 2186.
What does count_tokens_approximately() call?
count_tokens_approximately() calls 2 function(s): _get_message_openai_role, convert_to_messages.
What calls count_tokens_approximately()?
count_tokens_approximately() is called by 1 function(s): _approximate_token_counter.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free