base.py — langchain Source File

Architecture documentation for base.py, a python file in the langchain codebase. 15 imports, 0 dependents.

File python LangChainCore LanguageModelBase 15 imports 1 functions 1 classes

Entity Profile

LangChainCore→ LanguageModelBase→ base.py — langchain Source File

Dependency Diagram

graph LR
  e013244a_7e0e_baa7_ce3b_16dab4320e45["base.py"]
  e27da29f_a1f7_49f3_84d5_6be4cb4125c8["logging"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> e27da29f_a1f7_49f3_84d5_6be4cb4125c8
  f3365e3c_fb7a_bb9a_bc79_059b06cb7024["warnings"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> f3365e3c_fb7a_bb9a_bc79_059b06cb7024
  2bf6d401_816d_d011_3b05_a6114f55ff58["collections.abc"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> 2bf6d401_816d_d011_3b05_a6114f55ff58
  feec1ec4_6917_867b_d228_b134d0ff8099["typing"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> feec1ec4_6917_867b_d228_b134d0ff8099
  082af17d_b8ac_eccd_d339_93cabe1a9b40["openai"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> 082af17d_b8ac_eccd_d339_93cabe1a9b40
  48f5485f_680a_97b7_bfc7_aff0508d4ca0["tiktoken"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> 48f5485f_680a_97b7_bfc7_aff0508d4ca0
  918b8514_ba55_6df2_7254_4598ec160e33["langchain_core.embeddings"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> 918b8514_ba55_6df2_7254_4598ec160e33
  a8ec7563_2814_99b3_c6da_61c599efc542["langchain_core.runnables.config"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> a8ec7563_2814_99b3_c6da_61c599efc542
  bd035cf2_5933_bc0f_65e9_0dfe57627ca3["langchain_core.utils"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> bd035cf2_5933_bc0f_65e9_0dfe57627ca3
  dd5e7909_a646_84f1_497b_cae69735550e["pydantic"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> dd5e7909_a646_84f1_497b_cae69735550e
  f85fae70_1011_eaec_151c_4083140ae9e5["typing_extensions"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> f85fae70_1011_eaec_151c_4083140ae9e5
  29eba672_199f_3c52_cdbc_39f5c194182e["langchain_openai.chat_models._client_utils"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> 29eba672_199f_3c52_cdbc_39f5c194182e
  d2b62b81_6a74_9153_fcd4_ff7470a9b3d2["httpx"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> d2b62b81_6a74_9153_fcd4_ff7470a9b3d2
  c29ae04f_6e26_fc73_5938_d57db6543f18["transformers"]
  e013244a_7e0e_baa7_ce3b_16dab4320e45 --> c29ae04f_6e26_fc73_5938_d57db6543f18
  style e013244a_7e0e_baa7_ce3b_16dab4320e45 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Base classes for OpenAI embeddings."""

from __future__ import annotations

import logging
import warnings
from collections.abc import Awaitable, Callable, Iterable, Mapping, Sequence
from typing import Any, Literal, cast

import openai
import tiktoken
from langchain_core.embeddings import Embeddings
from langchain_core.runnables.config import run_in_executor
from langchain_core.utils import from_env, get_pydantic_field_names, secret_from_env
from pydantic import BaseModel, ConfigDict, Field, SecretStr, model_validator
from typing_extensions import Self

from langchain_openai.chat_models._client_utils import _resolve_sync_and_async_api_keys

logger = logging.getLogger(__name__)

MAX_TOKENS_PER_REQUEST = 300000
"""API limit per request for embedding tokens."""


def _process_batched_chunked_embeddings(
    num_texts: int,
    tokens: list[list[int] | str],
    batched_embeddings: list[list[float]],
    indices: list[int],
    skip_empty: bool,
) -> list[list[float] | None]:
    # for each text, this is the list of embeddings (list of list of floats)
    # corresponding to the chunks of the text
    results: list[list[list[float]]] = [[] for _ in range(num_texts)]

    # for each text, this is the token length of each chunk
    # for transformers tokenization, this is the string length
    # for tiktoken, this is the number of tokens
    num_tokens_in_batch: list[list[int]] = [[] for _ in range(num_texts)]

    for i in range(len(indices)):
        if skip_empty and len(batched_embeddings[i]) == 1:
            continue
        results[indices[i]].append(batched_embeddings[i])
        num_tokens_in_batch[indices[i]].append(len(tokens[i]))

    # for each text, this is the final embedding
    embeddings: list[list[float] | None] = []
    for i in range(num_texts):
        # an embedding for each chunk
        _result: list[list[float]] = results[i]

        if len(_result) == 0:
            # this will be populated with the embedding of an empty string
            # in the sync or async code calling this
            embeddings.append(None)
            continue

        if len(_result) == 1:
// ... (713 more lines)

Domain

LangChainCore

Subdomains

LanguageModelBase

Functions

_process_batched_chunked_embeddings()

Classes

OpenAIEmbeddings

Dependencies

collections.abc
httpx
langchain_core.embeddings
langchain_core.runnables.config
langchain_core.utils
langchain_openai.chat_models._client_utils
logging
openai
pydantic
tiktoken
tqdm.auto
transformers
typing
typing_extensions
warnings

Source

View on GitHub

Frequently Asked Questions

What does base.py do?

base.py is a source file in the langchain codebase, written in python. It belongs to the LangChainCore domain, LanguageModelBase subdomain.

What functions are defined in base.py?

base.py defines 1 function(s): _process_batched_chunked_embeddings.

What does base.py depend on?

base.py imports 15 module(s): collections.abc, httpx, langchain_core.embeddings, langchain_core.runnables.config, langchain_core.utils, langchain_openai.chat_models._client_utils, logging, openai, and 7 more.

Where is base.py in the architecture?

base.py is located at libs/partners/openai/langchain_openai/embeddings/base.py (domain: LangChainCore, subdomain: LanguageModelBase, directory: libs/partners/openai/langchain_openai/embeddings).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free