get_num_tokens_from_messages() — langchain Function Reference

Architecture documentation for the get_num_tokens_from_messages() function in base.py from the langchain codebase.

Function python LangChainCore LanguageModelBase calls 4

Entity Profile

LangChainCore→ LanguageModelBase→ get_num_tokens_from_messages() — langchain Function Reference

Dependency Diagram

graph TD
  3d7b9c9e_fae0_940a_5e96_75a84e2ad11c["get_num_tokens_from_messages()"]
  2a683305_667b_3567_cab9_9f77e29d4afa["BaseChatOpenAI"]
  3d7b9c9e_fae0_940a_5e96_75a84e2ad11c -->|defined in| 2a683305_667b_3567_cab9_9f77e29d4afa
  927425a4_5f3b_5196_e538_3d7cee86141b["_get_encoding_model()"]
  3d7b9c9e_fae0_940a_5e96_75a84e2ad11c -->|calls| 927425a4_5f3b_5196_e538_3d7cee86141b
  fd643003_13df_3a67_6bdc_07576981e414["_convert_message_to_dict()"]
  3d7b9c9e_fae0_940a_5e96_75a84e2ad11c -->|calls| fd643003_13df_3a67_6bdc_07576981e414
  749c696d_69d8_c3f0_ae59_34c44e681f49["_url_to_size()"]
  3d7b9c9e_fae0_940a_5e96_75a84e2ad11c -->|calls| 749c696d_69d8_c3f0_ae59_34c44e681f49
  7cf4873b_bc13_276a_ae3d_52fa0c9a42ef["_count_image_tokens()"]
  3d7b9c9e_fae0_940a_5e96_75a84e2ad11c -->|calls| 7cf4873b_bc13_276a_ae3d_52fa0c9a42ef
  style 3d7b9c9e_fae0_940a_5e96_75a84e2ad11c fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/partners/openai/langchain_openai/chat_models/base.py lines 1767–1865

    def get_num_tokens_from_messages(
        self,
        messages: Sequence[BaseMessage],
        tools: Sequence[dict[str, Any] | type | Callable | BaseTool] | None = None,
        *,
        allow_fetching_images: bool = True,
    ) -> int:
        """Calculate num tokens for `gpt-3.5-turbo` and `gpt-4` with `tiktoken` package.

        !!! warning
            You must have the `pillow` installed if you want to count image tokens if
            you are specifying the image as a base64 string, and you must have both
            `pillow` and `httpx` installed if you are specifying the image as a URL. If
            these aren't installed image inputs will be ignored in token counting.

        [OpenAI reference](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb).

        Args:
            messages: The message inputs to tokenize.
            tools: If provided, sequence of `dict`, `BaseModel`, function, or `BaseTool`
                to be converted to tool schemas.
            allow_fetching_images: Whether to allow fetching images for token counting.
        """
        # TODO: Count bound tools as part of input.
        if tools is not None:
            warnings.warn(
                "Counting tokens in tool schemas is not yet supported. Ignoring tools."
            )
        if sys.version_info[1] <= 7:
            return super().get_num_tokens_from_messages(messages)
        model, encoding = self._get_encoding_model()
        if model.startswith("gpt-3.5-turbo-0301"):
            # every message follows <im_start>{role/name}\n{content}<im_end>\n
            tokens_per_message = 4
            # if there's a name, the role is omitted
            tokens_per_name = -1
        elif model.startswith(("gpt-3.5-turbo", "gpt-4", "gpt-5")):
            tokens_per_message = 3
            tokens_per_name = 1
        else:
            msg = (
                f"get_num_tokens_from_messages() is not presently implemented "
                f"for model {model}. See "
                "https://platform.openai.com/docs/guides/text-generation/managing-tokens"
                " for information on how messages are converted to tokens."
            )
            raise NotImplementedError(msg)
        num_tokens = 0
        messages_dict = [_convert_message_to_dict(m) for m in messages]
        for message in messages_dict:
            num_tokens += tokens_per_message
            for key, value in message.items():
                # This is an inferred approximation. OpenAI does not document how to
                # count tool message tokens.
                if key == "tool_call_id":
                    num_tokens += 3
                    continue
                if isinstance(value, list):
                    # content or tool calls
                    for val in value:
                        if isinstance(val, str) or val["type"] == "text":
                            text = val["text"] if isinstance(val, dict) else val
                            num_tokens += len(encoding.encode(text))
                        elif val["type"] == "image_url":
                            if val["image_url"].get("detail") == "low":
                                num_tokens += 85
                            elif allow_fetching_images:
                                image_size = _url_to_size(val["image_url"]["url"])
                                if not image_size:
                                    continue
                                num_tokens += _count_image_tokens(*image_size)
                            else:
                                pass
                        # Tool/function call token counting is not documented by OpenAI.
                        # This is an approximation.
                        elif val["type"] == "function":
                            num_tokens += len(
                                encoding.encode(val["function"]["arguments"])
                            )
                            num_tokens += len(encoding.encode(val["function"]["name"]))
                        elif val["type"] == "file":

Domain

LangChainCore

Subdomains

LanguageModelBase

Defined In

libs/partners/openai/langchain_openai/chat_models/base.py

Calls

Source

View on GitHub

Frequently Asked Questions

What does get_num_tokens_from_messages() do?

get_num_tokens_from_messages() is a function in the langchain codebase, defined in libs/partners/openai/langchain_openai/chat_models/base.py.

Where is get_num_tokens_from_messages() defined?

get_num_tokens_from_messages() is defined in libs/partners/openai/langchain_openai/chat_models/base.py at line 1767.

What does get_num_tokens_from_messages() call?

get_num_tokens_from_messages() calls 4 function(s): _convert_message_to_dict, _count_image_tokens, _get_encoding_model, _url_to_size.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free