cache.py — langchain Source File
Architecture documentation for cache.py, a python file in the langchain codebase. 10 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR 3202fcbc_ed12_ea87_2046_22982e5a006c["cache.py"] ca3eea8c_ddf5_4ba7_a40c_5ed2287c91fa["hashlib"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> ca3eea8c_ddf5_4ba7_a40c_5ed2287c91fa 7025b240_fdc3_cf68_b72f_f41dac94566b["json"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> 7025b240_fdc3_cf68_b72f_f41dac94566b 8dfa0cac_d802_3ccd_f710_43a5e70da3a5["uuid"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> 8dfa0cac_d802_3ccd_f710_43a5e70da3a5 0c635125_6987_b8b3_7ff7_d60249aecde7["warnings"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> 0c635125_6987_b8b3_7ff7_d60249aecde7 cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 bc46b61d_cfdf_3f6b_a9dd_ac2a328d84b3["langchain_core.embeddings"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> bc46b61d_cfdf_3f6b_a9dd_ac2a328d84b3 cf23aed0_f3dd_3cba_61aa_c00a3e5a1b92["langchain_core.stores"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> cf23aed0_f3dd_3cba_61aa_c00a3e5a1b92 fb97a5dd_8baa_cbb1_1219_066aff1f076c["langchain_core.utils.iter"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> fb97a5dd_8baa_cbb1_1219_066aff1f076c 11a9f766_2db8_7f17_8105_8f15ab8ad708["langchain_classic.storage.encoder_backed"] 3202fcbc_ed12_ea87_2046_22982e5a006c --> 11a9f766_2db8_7f17_8105_8f15ab8ad708 style 3202fcbc_ed12_ea87_2046_22982e5a006c fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Module contains code for a cache backed embedder.
The cache backed embedder is a wrapper around an embedder that caches
embeddings in a key-value store. The cache is used to avoid recomputing
embeddings for the same text.
The text is hashed and the hash is used as the key in the cache.
"""
from __future__ import annotations
import hashlib
import json
import uuid
import warnings
from collections.abc import Callable, Sequence
from typing import Literal, cast
from langchain_core.embeddings import Embeddings
from langchain_core.stores import BaseStore, ByteStore
from langchain_core.utils.iter import batch_iterate
from langchain_classic.storage.encoder_backed import EncoderBackedStore
NAMESPACE_UUID = uuid.UUID(int=1985)
def _sha1_hash_to_uuid(text: str) -> uuid.UUID:
"""Return a UUID derived from *text* using SHA-1 (deterministic).
Deterministic and fast, **but not collision-resistant**.
A malicious attacker could try to create two different texts that hash to the same
UUID. This may not necessarily be an issue in the context of caching embeddings,
but new applications should swap this out for a stronger hash function like
xxHash, BLAKE2 or SHA-256, which are collision-resistant.
"""
sha1_hex = hashlib.sha1(text.encode("utf-8"), usedforsecurity=False).hexdigest()
# Embed the hex string in `uuid5` to obtain a valid UUID.
return uuid.uuid5(NAMESPACE_UUID, sha1_hex)
def _make_default_key_encoder(namespace: str, algorithm: str) -> Callable[[str], str]:
"""Create a default key encoder function.
Args:
namespace: Prefix that segregates keys from different embedding models.
algorithm:
* `'sha1'` - fast but not collision-resistant
* `'blake2b'` - cryptographically strong, faster than SHA-1
* `'sha256'` - cryptographically strong, slower than SHA-1
* `'sha512'` - cryptographically strong, slower than SHA-1
Returns:
A function that encodes a key using the specified algorithm.
"""
if algorithm == "sha1":
_warn_about_sha1_encoder()
def _key_encoder(key: str) -> str:
// ... (311 more lines)
Domain
Subdomains
Functions
Classes
Dependencies
- collections.abc
- hashlib
- json
- langchain_classic.storage.encoder_backed
- langchain_core.embeddings
- langchain_core.stores
- langchain_core.utils.iter
- typing
- uuid
- warnings
Source
Frequently Asked Questions
What does cache.py do?
cache.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, Serialization subdomain.
What functions are defined in cache.py?
cache.py defines 5 function(s): _make_default_key_encoder, _sha1_hash_to_uuid, _value_deserializer, _value_serializer, _warn_about_sha1_encoder.
What does cache.py depend on?
cache.py imports 10 module(s): collections.abc, hashlib, json, langchain_classic.storage.encoder_backed, langchain_core.embeddings, langchain_core.stores, langchain_core.utils.iter, typing, and 2 more.
Where is cache.py in the architecture?
cache.py is located at libs/langchain/langchain_classic/embeddings/cache.py (domain: CoreAbstractions, subdomain: Serialization, directory: libs/langchain/langchain_classic/embeddings).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free