Home / Class/ MultiVectorRetriever Class — langchain Architecture

MultiVectorRetriever Class — langchain Architecture

Architecture documentation for the MultiVectorRetriever class in multi_vector.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  c7ceffe2_56b9_b96d_5de5_8e679b778e9a["MultiVectorRetriever"]
  3a20478a_3692_141f_433b_a32429b00020["BaseRetriever"]
  c7ceffe2_56b9_b96d_5de5_8e679b778e9a -->|extends| 3a20478a_3692_141f_433b_a32429b00020
  3cc61483_6a5a_7183_a480_7020803f3c23["multi_vector.py"]
  c7ceffe2_56b9_b96d_5de5_8e679b778e9a -->|defined in| 3cc61483_6a5a_7183_a480_7020803f3c23
  5e31b49e_54a9_5e91_f1a5_323af704f180["_shim_docstore()"]
  c7ceffe2_56b9_b96d_5de5_8e679b778e9a -->|method| 5e31b49e_54a9_5e91_f1a5_323af704f180
  797e6097_1f8a_2d77_ccb0_62a0ea59a9f3["_get_relevant_documents()"]
  c7ceffe2_56b9_b96d_5de5_8e679b778e9a -->|method| 797e6097_1f8a_2d77_ccb0_62a0ea59a9f3
  57699508_922d_f9de_b836_1f93a0bf0af3["_aget_relevant_documents()"]
  c7ceffe2_56b9_b96d_5de5_8e679b778e9a -->|method| 57699508_922d_f9de_b836_1f93a0bf0af3

Relationship Graph

Source Code

libs/langchain/langchain_classic/retrievers/multi_vector.py lines 29–142

class MultiVectorRetriever(BaseRetriever):
    """Retrieve from a set of multiple embeddings for the same document."""

    vectorstore: VectorStore
    """The underlying `VectorStore` to use to store small chunks
    and their embedding vectors"""

    byte_store: ByteStore | None = None
    """The lower-level backing storage layer for the parent documents"""

    docstore: BaseStore[str, Document]
    """The storage interface for the parent documents"""

    id_key: str = "doc_id"

    search_kwargs: dict = Field(default_factory=dict)
    """Keyword arguments to pass to the search function."""

    search_type: SearchType = SearchType.similarity
    """Type of search to perform (similarity / mmr)"""

    @model_validator(mode="before")
    @classmethod
    def _shim_docstore(cls, values: dict) -> Any:
        byte_store = values.get("byte_store")
        docstore = values.get("docstore")
        if byte_store is not None:
            docstore = create_kv_docstore(byte_store)
        elif docstore is None:
            msg = "You must pass a `byte_store` parameter."
            raise ValueError(msg)
        values["docstore"] = docstore
        return values

    @override
    def _get_relevant_documents(
        self,
        query: str,
        *,
        run_manager: CallbackManagerForRetrieverRun,
    ) -> list[Document]:
        """Get documents relevant to a query.

        Args:
            query: String to find relevant documents for
            run_manager: The callbacks handler to use
        Returns:
            List of relevant documents.
        """
        if self.search_type == SearchType.mmr:
            sub_docs = self.vectorstore.max_marginal_relevance_search(
                query,
                **self.search_kwargs,
            )
        elif self.search_type == SearchType.similarity_score_threshold:
            sub_docs_and_similarities = (
                self.vectorstore.similarity_search_with_relevance_scores(
                    query,
                    **self.search_kwargs,
                )
            )
            sub_docs = [sub_doc for sub_doc, _ in sub_docs_and_similarities]
        else:
            sub_docs = self.vectorstore.similarity_search(query, **self.search_kwargs)

        # We do this to maintain the order of the IDs that are returned
        ids = []
        for d in sub_docs:
            if self.id_key in d.metadata and d.metadata[self.id_key] not in ids:
                ids.append(d.metadata[self.id_key])
        docs = self.docstore.mget(ids)
        return [d for d in docs if d is not None]

    @override
    async def _aget_relevant_documents(
        self,
        query: str,
        *,
        run_manager: AsyncCallbackManagerForRetrieverRun,
    ) -> list[Document]:
        """Asynchronously get documents relevant to a query.

Extends

Frequently Asked Questions

What is the MultiVectorRetriever class?
MultiVectorRetriever is a class in the langchain codebase, defined in libs/langchain/langchain_classic/retrievers/multi_vector.py.
Where is MultiVectorRetriever defined?
MultiVectorRetriever is defined in libs/langchain/langchain_classic/retrievers/multi_vector.py at line 29.
What does MultiVectorRetriever extend?
MultiVectorRetriever extends BaseRetriever.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free