from_bytes_store() — langchain Function Reference

Architecture documentation for the from_bytes_store() function in cache.py from the langchain codebase.

Function python LangChainCore Runnables calls 1

Entity Profile

LangChainCore→ Runnables→ from_bytes_store() — langchain Function Reference

Dependency Diagram

graph TD
  fc5a90e3_3529_5688_86a8_34cee618454e["from_bytes_store()"]
  b3be4e54_ae5f_c527_4e99_0843e3d30f72["CacheBackedEmbeddings"]
  fc5a90e3_3529_5688_86a8_34cee618454e -->|defined in| b3be4e54_ae5f_c527_4e99_0843e3d30f72
  4b1f75e8_3a36_4d2e_f5d9_04dbd0255a60["_make_default_key_encoder()"]
  fc5a90e3_3529_5688_86a8_34cee618454e -->|calls| 4b1f75e8_3a36_4d2e_f5d9_04dbd0255a60
  style fc5a90e3_3529_5688_86a8_34cee618454e fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/langchain/langchain_classic/embeddings/cache.py lines 288–370

    def from_bytes_store(
        cls,
        underlying_embeddings: Embeddings,
        document_embedding_cache: ByteStore,
        *,
        namespace: str = "",
        batch_size: int | None = None,
        query_embedding_cache: bool | ByteStore = False,
        key_encoder: Callable[[str], str]
        | Literal["sha1", "blake2b", "sha256", "sha512"] = "sha1",
    ) -> CacheBackedEmbeddings:
        """On-ramp that adds the necessary serialization and encoding to the store.

        Args:
            underlying_embeddings: The embedder to use for embedding.
            document_embedding_cache: The cache to use for storing document embeddings.
            *,
            namespace: The namespace to use for document cache.
                This namespace is used to avoid collisions with other caches.
                For example, set it to the name of the embedding model used.
            batch_size: The number of documents to embed between store updates.
            query_embedding_cache: The cache to use for storing query embeddings.
                True to use the same cache as document embeddings.
                False to not cache query embeddings.
            key_encoder: Optional callable to encode keys. If not provided,
                a default encoder using SHA-1 will be used. SHA-1 is not
                collision-resistant, and a motivated attacker could craft two
                different texts that hash to the same cache key.

                New applications should use one of the alternative encoders
                or provide a custom and strong key encoder function to avoid this risk.

                If you change a key encoder in an existing cache, consider
                just creating a new cache, to avoid (the potential for)
                collisions with existing keys or having duplicate keys
                for the same text in the cache.

        Returns:
            An instance of CacheBackedEmbeddings that uses the provided cache.
        """
        if isinstance(key_encoder, str):
            key_encoder = _make_default_key_encoder(namespace, key_encoder)
        elif callable(key_encoder):
            # If a custom key encoder is provided, it should not be used with a
            # namespace.
            # A user can handle namespacing in directly their custom key encoder.
            if namespace:
                msg = (
                    "Do not supply `namespace` when using a custom key_encoder; "
                    "add any prefixing inside the encoder itself."
                )
                raise ValueError(msg)
        else:
            msg = (  # type: ignore[unreachable]
                "key_encoder must be either 'blake2b', 'sha1', 'sha256', 'sha512' "
                "or a callable that encodes keys."
            )
            raise ValueError(msg)  # noqa: TRY004

        document_embedding_store = EncoderBackedStore[str, list[float]](
            document_embedding_cache,
            key_encoder,
            _value_serializer,
            _value_deserializer,
        )
        if query_embedding_cache is True:
            query_embedding_store = document_embedding_store
        elif query_embedding_cache is False:
            query_embedding_store = None
        else:
            query_embedding_store = EncoderBackedStore[str, list[float]](
                query_embedding_cache,
                key_encoder,
                _value_serializer,
                _value_deserializer,
            )

        return cls(
            underlying_embeddings,
            document_embedding_store,
            batch_size=batch_size,