_aget_len_safe_embeddings() — langchain Function Reference
Architecture documentation for the _aget_len_safe_embeddings() function in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD b1a193e7_39a7_c737_2248_ba3dd74ba93c["_aget_len_safe_embeddings()"] 2f237d29_e276_c4ef_3a56_7139ce49b50e["OpenAIEmbeddings"] b1a193e7_39a7_c737_2248_ba3dd74ba93c -->|defined in| 2f237d29_e276_c4ef_3a56_7139ce49b50e 61442dce_e074_a559_4f56_e5a72f5d3c6c["aembed_documents()"] 61442dce_e074_a559_4f56_e5a72f5d3c6c -->|calls| b1a193e7_39a7_c737_2248_ba3dd74ba93c 334ac3be_75cb_5ada_dfd5_067dbcd323f0["_process_batched_chunked_embeddings()"] b1a193e7_39a7_c737_2248_ba3dd74ba93c -->|calls| 334ac3be_75cb_5ada_dfd5_067dbcd323f0 style b1a193e7_39a7_c737_2248_ba3dd74ba93c fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/partners/openai/langchain_openai/embeddings/base.py lines 601–675
async def _aget_len_safe_embeddings(
self,
texts: list[str],
*,
engine: str,
chunk_size: int | None = None,
**kwargs: Any,
) -> list[list[float]]:
"""Asynchronously generate length-safe embeddings for a list of texts.
This method handles tokenization and embedding generation, respecting the
`embedding_ctx_length` and `chunk_size`. Supports both `tiktoken` and
HuggingFace `transformers` based on the `tiktoken_enabled` flag.
Args:
texts: The list of texts to embed.
engine: The engine or model to use for embeddings.
chunk_size: The size of chunks for processing embeddings.
Returns:
A list of embeddings for each input text.
"""
_chunk_size = chunk_size or self.chunk_size
client_kwargs = {**self._invocation_params, **kwargs}
_iter, tokens, indices, token_counts = await run_in_executor(
None, self._tokenize, texts, _chunk_size
)
batched_embeddings: list[list[float]] = []
# Process in batches respecting the token limit
i = 0
while i < len(tokens):
# Determine how many chunks we can include in this batch
batch_token_count = 0
batch_end = i
for j in range(i, min(i + _chunk_size, len(tokens))):
chunk_tokens = token_counts[j]
# Check if adding this chunk would exceed the limit
if batch_token_count + chunk_tokens > MAX_TOKENS_PER_REQUEST:
if batch_end == i:
# Single chunk exceeds limit - handle it anyway
batch_end = j + 1
break
batch_token_count += chunk_tokens
batch_end = j + 1
# Make API call with this batch
batch_tokens = tokens[i:batch_end]
response = await self.async_client.create(
input=batch_tokens, **client_kwargs
)
if not isinstance(response, dict):
response = response.model_dump()
batched_embeddings.extend(r["embedding"] for r in response["data"])
i = batch_end
embeddings = _process_batched_chunked_embeddings(
len(texts), tokens, batched_embeddings, indices, self.skip_empty
)
_cached_empty_embedding: list[float] | None = None
async def empty_embedding() -> list[float]:
nonlocal _cached_empty_embedding
if _cached_empty_embedding is None:
average_embedded = await self.async_client.create(
input="", **client_kwargs
)
if not isinstance(average_embedded, dict):
average_embedded = average_embedded.model_dump()
_cached_empty_embedding = average_embedded["data"][0]["embedding"]
return _cached_empty_embedding
return [e if e is not None else await empty_embedding() for e in embeddings]
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does _aget_len_safe_embeddings() do?
_aget_len_safe_embeddings() is a function in the langchain codebase, defined in libs/partners/openai/langchain_openai/embeddings/base.py.
Where is _aget_len_safe_embeddings() defined?
_aget_len_safe_embeddings() is defined in libs/partners/openai/langchain_openai/embeddings/base.py at line 601.
What does _aget_len_safe_embeddings() call?
_aget_len_safe_embeddings() calls 1 function(s): _process_batched_chunked_embeddings.
What calls _aget_len_safe_embeddings()?
_aget_len_safe_embeddings() is called by 1 function(s): aembed_documents.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free