cache.py — langchain Source File

Architecture documentation for cache.py, a python file in the langchain codebase. 10 imports, 0 dependents.

File python CoreAbstractions Serialization 10 imports 5 functions 1 classes

Entity Profile

CoreAbstractions→ Serialization→ cache.py — langchain Source File

Dependency Diagram

graph LR
  3202fcbc_ed12_ea87_2046_22982e5a006c["cache.py"]
  ca3eea8c_ddf5_4ba7_a40c_5ed2287c91fa["hashlib"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> ca3eea8c_ddf5_4ba7_a40c_5ed2287c91fa
  7025b240_fdc3_cf68_b72f_f41dac94566b["json"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> 7025b240_fdc3_cf68_b72f_f41dac94566b
  8dfa0cac_d802_3ccd_f710_43a5e70da3a5["uuid"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> 8dfa0cac_d802_3ccd_f710_43a5e70da3a5
  0c635125_6987_b8b3_7ff7_d60249aecde7["warnings"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> 0c635125_6987_b8b3_7ff7_d60249aecde7
  cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7
  8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3
  bc46b61d_cfdf_3f6b_a9dd_ac2a328d84b3["langchain_core.embeddings"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> bc46b61d_cfdf_3f6b_a9dd_ac2a328d84b3
  cf23aed0_f3dd_3cba_61aa_c00a3e5a1b92["langchain_core.stores"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> cf23aed0_f3dd_3cba_61aa_c00a3e5a1b92
  fb97a5dd_8baa_cbb1_1219_066aff1f076c["langchain_core.utils.iter"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> fb97a5dd_8baa_cbb1_1219_066aff1f076c
  11a9f766_2db8_7f17_8105_8f15ab8ad708["langchain_classic.storage.encoder_backed"]
  3202fcbc_ed12_ea87_2046_22982e5a006c --> 11a9f766_2db8_7f17_8105_8f15ab8ad708
  style 3202fcbc_ed12_ea87_2046_22982e5a006c fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Module contains code for a cache backed embedder.

The cache backed embedder is a wrapper around an embedder that caches
embeddings in a key-value store. The cache is used to avoid recomputing
embeddings for the same text.

The text is hashed and the hash is used as the key in the cache.
"""

from __future__ import annotations

import hashlib
import json
import uuid
import warnings
from collections.abc import Callable, Sequence
from typing import Literal, cast

from langchain_core.embeddings import Embeddings
from langchain_core.stores import BaseStore, ByteStore
from langchain_core.utils.iter import batch_iterate

from langchain_classic.storage.encoder_backed import EncoderBackedStore

NAMESPACE_UUID = uuid.UUID(int=1985)


def _sha1_hash_to_uuid(text: str) -> uuid.UUID:
    """Return a UUID derived from *text* using SHA-1 (deterministic).

    Deterministic and fast, **but not collision-resistant**.

    A malicious attacker could try to create two different texts that hash to the same
    UUID. This may not necessarily be an issue in the context of caching embeddings,
    but new applications should swap this out for a stronger hash function like
    xxHash, BLAKE2 or SHA-256, which are collision-resistant.
    """
    sha1_hex = hashlib.sha1(text.encode("utf-8"), usedforsecurity=False).hexdigest()
    # Embed the hex string in `uuid5` to obtain a valid UUID.
    return uuid.uuid5(NAMESPACE_UUID, sha1_hex)


def _make_default_key_encoder(namespace: str, algorithm: str) -> Callable[[str], str]:
    """Create a default key encoder function.

    Args:
        namespace: Prefix that segregates keys from different embedding models.
        algorithm:
           * `'sha1'` - fast but not collision-resistant
           * `'blake2b'` - cryptographically strong, faster than SHA-1
           * `'sha256'` - cryptographically strong, slower than SHA-1
           * `'sha512'` - cryptographically strong, slower than SHA-1

    Returns:
        A function that encodes a key using the specified algorithm.
    """
    if algorithm == "sha1":
        _warn_about_sha1_encoder()

    def _key_encoder(key: str) -> str:
// ... (311 more lines)

Domain

CoreAbstractions

Subdomains

Serialization

Functions

Classes

CacheBackedEmbeddings

Dependencies

collections.abc
hashlib
json
langchain_classic.storage.encoder_backed
langchain_core.embeddings
langchain_core.stores
langchain_core.utils.iter
typing
uuid
warnings

Source

View on GitHub

Frequently Asked Questions

What does cache.py do?

cache.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, Serialization subdomain.

What functions are defined in cache.py?

cache.py defines 5 function(s): _make_default_key_encoder, _sha1_hash_to_uuid, _value_deserializer, _value_serializer, _warn_about_sha1_encoder.

What does cache.py depend on?

cache.py imports 10 module(s): collections.abc, hashlib, json, langchain_classic.storage.encoder_backed, langchain_core.embeddings, langchain_core.stores, langchain_core.utils.iter, typing, and 2 more.

Where is cache.py in the architecture?

cache.py is located at libs/langchain/langchain_classic/embeddings/cache.py (domain: CoreAbstractions, subdomain: Serialization, directory: libs/langchain/langchain_classic/embeddings).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free