EmbeddingsFilter Class — langchain Architecture
Architecture documentation for the EmbeddingsFilter class in embeddings_filter.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 7a0a1fba_1d80_811c_4eea_2a0bc8f55ee6["EmbeddingsFilter"] 56ee7e00_cbf2_37e6_b294_468dfe7f2941["BaseDocumentCompressor"] 7a0a1fba_1d80_811c_4eea_2a0bc8f55ee6 -->|extends| 56ee7e00_cbf2_37e6_b294_468dfe7f2941 4e2e47d6_33a4_62d4_a5cc_1a0067c152ce["embeddings_filter.py"] 7a0a1fba_1d80_811c_4eea_2a0bc8f55ee6 -->|defined in| 4e2e47d6_33a4_62d4_a5cc_1a0067c152ce 92f00db6_7c28_e109_4a04_00bd43a27aa4["validate_params()"] 7a0a1fba_1d80_811c_4eea_2a0bc8f55ee6 -->|method| 92f00db6_7c28_e109_4a04_00bd43a27aa4 6176051d_2fe3_0d59_5cde_d2d5237e9e00["compress_documents()"] 7a0a1fba_1d80_811c_4eea_2a0bc8f55ee6 -->|method| 6176051d_2fe3_0d59_5cde_d2d5237e9e00 35e9f310_53c6_44f0_0e88_a84cd20d72b6["acompress_documents()"] 7a0a1fba_1d80_811c_4eea_2a0bc8f55ee6 -->|method| 35e9f310_53c6_44f0_0e88_a84cd20d72b6
Relationship Graph
Source Code
libs/langchain/langchain_classic/retrievers/document_compressors/embeddings_filter.py lines 23–141
class EmbeddingsFilter(BaseDocumentCompressor):
"""Embeddings Filter.
Document compressor that uses embeddings to drop documents unrelated to the query.
"""
embeddings: Embeddings
"""Embeddings to use for embedding document contents and queries."""
similarity_fn: Callable = Field(default_factory=_get_similarity_function)
"""Similarity function for comparing documents. Function expected to take as input
two matrices (List[List[float]]) and return a matrix of scores where higher values
indicate greater similarity."""
k: int | None = 20
"""The number of relevant documents to return. Can be set to `None`, in which case
`similarity_threshold` must be specified."""
similarity_threshold: float | None = None
"""Threshold for determining when two documents are similar enough
to be considered redundant. Defaults to `None`, must be specified if `k` is set
to None."""
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
@pre_init
def validate_params(cls, values: dict) -> dict:
"""Validate similarity parameters."""
if values["k"] is None and values["similarity_threshold"] is None:
msg = "Must specify one of `k` or `similarity_threshold`."
raise ValueError(msg)
return values
@override
def compress_documents(
self,
documents: Sequence[Document],
query: str,
callbacks: Callbacks | None = None,
) -> Sequence[Document]:
"""Filter documents based on similarity of their embeddings to the query."""
try:
from langchain_community.document_transformers.embeddings_redundant_filter import ( # noqa: E501
_get_embeddings_from_stateful_docs,
get_stateful_documents,
)
except ImportError as e:
msg = (
"To use please install langchain-community "
"with `pip install langchain-community`."
)
raise ImportError(msg) from e
try:
import numpy as np
except ImportError as e:
msg = "Could not import numpy, please install with `pip install numpy`."
raise ImportError(msg) from e
stateful_documents = get_stateful_documents(documents)
embedded_documents = _get_embeddings_from_stateful_docs(
self.embeddings,
stateful_documents,
)
embedded_query = self.embeddings.embed_query(query)
similarity = self.similarity_fn([embedded_query], embedded_documents)[0]
included_idxs: np.ndarray = np.arange(len(embedded_documents))
if self.k is not None:
included_idxs = np.argsort(similarity)[::-1][: self.k]
if self.similarity_threshold is not None:
similar_enough = np.where(
similarity[included_idxs] > self.similarity_threshold,
)
included_idxs = included_idxs[similar_enough]
for i in included_idxs:
stateful_documents[i].state["query_similarity_score"] = similarity[i]
return [stateful_documents[i] for i in included_idxs]
@override
async def acompress_documents(
self,
documents: Sequence[Document],
query: str,
Extends
Source
Frequently Asked Questions
What is the EmbeddingsFilter class?
EmbeddingsFilter is a class in the langchain codebase, defined in libs/langchain/langchain_classic/retrievers/document_compressors/embeddings_filter.py.
Where is EmbeddingsFilter defined?
EmbeddingsFilter is defined in libs/langchain/langchain_classic/retrievers/document_compressors/embeddings_filter.py at line 23.
What does EmbeddingsFilter extend?
EmbeddingsFilter extends BaseDocumentCompressor.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free