CacheBackedEmbeddings Class — langchain Architecture
Architecture documentation for the CacheBackedEmbeddings class in cache.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 3a7e80ef_1548_f01f_076c_acecaa890f64["CacheBackedEmbeddings"] c58e6864_9429_b081_883b_39ba15df0485["Embeddings"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|extends| c58e6864_9429_b081_883b_39ba15df0485 3202fcbc_ed12_ea87_2046_22982e5a006c["cache.py"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|defined in| 3202fcbc_ed12_ea87_2046_22982e5a006c fda173f4_3a64_9d7b_793f_7eb52fb040e2["__init__()"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|method| fda173f4_3a64_9d7b_793f_7eb52fb040e2 30fe4773_96ea_16e1_9496_8370803417c1["embed_documents()"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|method| 30fe4773_96ea_16e1_9496_8370803417c1 890ded5d_fdb5_330b_bbe9_852a89849687["aembed_documents()"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|method| 890ded5d_fdb5_330b_bbe9_852a89849687 0f957ae5_7471_389a_6098_a006484fea03["embed_query()"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|method| 0f957ae5_7471_389a_6098_a006484fea03 6f7daa65_2440_2594_3e34_d34095dc884a["aembed_query()"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|method| 6f7daa65_2440_2594_3e34_d34095dc884a 202d6f86_7099_9e06_e40f_1bb0c171f805["from_bytes_store()"] 3a7e80ef_1548_f01f_076c_acecaa890f64 -->|method| 202d6f86_7099_9e06_e40f_1bb0c171f805
Relationship Graph
Source Code
libs/langchain/langchain_classic/embeddings/cache.py lines 108–370
class CacheBackedEmbeddings(Embeddings):
"""Interface for caching results from embedding models.
The interface allows works with any store that implements
the abstract store interface accepting keys of type str and values of list of
floats.
If need be, the interface can be extended to accept other implementations
of the value serializer and deserializer, as well as the key encoder.
Note that by default only document embeddings are cached. To cache query
embeddings too, pass in a query_embedding_store to constructor.
Examples:
```python
from langchain_classic.embeddings import CacheBackedEmbeddings
from langchain_classic.storage import LocalFileStore
from langchain_openai import OpenAIEmbeddings
store = LocalFileStore("./my_cache")
underlying_embedder = OpenAIEmbeddings()
embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embedder, store, namespace=underlying_embedder.model
)
# Embedding is computed and cached
embeddings = embedder.embed_documents(["hello", "goodbye"])
# Embeddings are retrieved from the cache, no computation is done
embeddings = embedder.embed_documents(["hello", "goodbye"])
```
"""
def __init__(
self,
underlying_embeddings: Embeddings,
document_embedding_store: BaseStore[str, list[float]],
*,
batch_size: int | None = None,
query_embedding_store: BaseStore[str, list[float]] | None = None,
) -> None:
"""Initialize the embedder.
Args:
underlying_embeddings: the embedder to use for computing embeddings.
document_embedding_store: The store to use for caching document embeddings.
batch_size: The number of documents to embed between store updates.
query_embedding_store: The store to use for caching query embeddings.
If `None`, query embeddings are not cached.
"""
super().__init__()
self.document_embedding_store = document_embedding_store
self.query_embedding_store = query_embedding_store
self.underlying_embeddings = underlying_embeddings
self.batch_size = batch_size
def embed_documents(self, texts: list[str]) -> list[list[float]]:
"""Embed a list of texts.
The method first checks the cache for the embeddings.
If the embeddings are not found, the method uses the underlying embedder
to embed the documents and stores the results in the cache.
Args:
texts: A list of texts to embed.
Returns:
A list of embeddings for the given texts.
"""
vectors: list[list[float] | None] = self.document_embedding_store.mget(
texts,
)
all_missing_indices: list[int] = [
i for i, vector in enumerate(vectors) if vector is None
]
for missing_indices in batch_iterate(self.batch_size, all_missing_indices):
missing_texts = [texts[i] for i in missing_indices]
missing_vectors = self.underlying_embeddings.embed_documents(missing_texts)
self.document_embedding_store.mset(
Extends
Source
Frequently Asked Questions
What is the CacheBackedEmbeddings class?
CacheBackedEmbeddings is a class in the langchain codebase, defined in libs/langchain/langchain_classic/embeddings/cache.py.
Where is CacheBackedEmbeddings defined?
CacheBackedEmbeddings is defined in libs/langchain/langchain_classic/embeddings/cache.py at line 108.
What does CacheBackedEmbeddings extend?
CacheBackedEmbeddings extends Embeddings.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free