_get_len_safe_embeddings() — langchain Function Reference
Architecture documentation for the _get_len_safe_embeddings() function in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD bd7de307_7f7c_35fc_e574_e5dfd1b9a161["_get_len_safe_embeddings()"] 2f237d29_e276_c4ef_3a56_7139ce49b50e["OpenAIEmbeddings"] bd7de307_7f7c_35fc_e574_e5dfd1b9a161 -->|defined in| 2f237d29_e276_c4ef_3a56_7139ce49b50e 64f4fa06_4784_3e51_9655_1c4667c3f612["embed_documents()"] 64f4fa06_4784_3e51_9655_1c4667c3f612 -->|calls| bd7de307_7f7c_35fc_e574_e5dfd1b9a161 63131dd8_7b39_d355_1b39_29f63d60d98e["_tokenize()"] bd7de307_7f7c_35fc_e574_e5dfd1b9a161 -->|calls| 63131dd8_7b39_d355_1b39_29f63d60d98e 334ac3be_75cb_5ada_dfd5_067dbcd323f0["_process_batched_chunked_embeddings()"] bd7de307_7f7c_35fc_e574_e5dfd1b9a161 -->|calls| 334ac3be_75cb_5ada_dfd5_067dbcd323f0 style bd7de307_7f7c_35fc_e574_e5dfd1b9a161 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/partners/openai/langchain_openai/embeddings/base.py lines 529–597
def _get_len_safe_embeddings(
self,
texts: list[str],
*,
engine: str,
chunk_size: int | None = None,
**kwargs: Any,
) -> list[list[float]]:
"""Generate length-safe embeddings for a list of texts.
This method handles tokenization and embedding generation, respecting the
`embedding_ctx_length` and `chunk_size`. Supports both `tiktoken` and
HuggingFace `transformers` based on the `tiktoken_enabled` flag.
Args:
texts: The list of texts to embed.
engine: The engine or model to use for embeddings.
chunk_size: The size of chunks for processing embeddings.
Returns:
A list of embeddings for each input text.
"""
_chunk_size = chunk_size or self.chunk_size
client_kwargs = {**self._invocation_params, **kwargs}
_iter, tokens, indices, token_counts = self._tokenize(texts, _chunk_size)
batched_embeddings: list[list[float]] = []
# Process in batches respecting the token limit
i = 0
while i < len(tokens):
# Determine how many chunks we can include in this batch
batch_token_count = 0
batch_end = i
for j in range(i, min(i + _chunk_size, len(tokens))):
chunk_tokens = token_counts[j]
# Check if adding this chunk would exceed the limit
if batch_token_count + chunk_tokens > MAX_TOKENS_PER_REQUEST:
if batch_end == i:
# Single chunk exceeds limit - handle it anyway
batch_end = j + 1
break
batch_token_count += chunk_tokens
batch_end = j + 1
# Make API call with this batch
batch_tokens = tokens[i:batch_end]
response = self.client.create(input=batch_tokens, **client_kwargs)
if not isinstance(response, dict):
response = response.model_dump()
batched_embeddings.extend(r["embedding"] for r in response["data"])
i = batch_end
embeddings = _process_batched_chunked_embeddings(
len(texts), tokens, batched_embeddings, indices, self.skip_empty
)
_cached_empty_embedding: list[float] | None = None
def empty_embedding() -> list[float]:
nonlocal _cached_empty_embedding
if _cached_empty_embedding is None:
average_embedded = self.client.create(input="", **client_kwargs)
if not isinstance(average_embedded, dict):
average_embedded = average_embedded.model_dump()
_cached_empty_embedding = average_embedded["data"][0]["embedding"]
return _cached_empty_embedding
return [e if e is not None else empty_embedding() for e in embeddings]
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does _get_len_safe_embeddings() do?
_get_len_safe_embeddings() is a function in the langchain codebase, defined in libs/partners/openai/langchain_openai/embeddings/base.py.
Where is _get_len_safe_embeddings() defined?
_get_len_safe_embeddings() is defined in libs/partners/openai/langchain_openai/embeddings/base.py at line 529.
What does _get_len_safe_embeddings() call?
_get_len_safe_embeddings() calls 2 function(s): _process_batched_chunked_embeddings, _tokenize.
What calls _get_len_safe_embeddings()?
_get_len_safe_embeddings() is called by 1 function(s): embed_documents.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free