Home / Class/ BaseRetriever Class — langchain Architecture

BaseRetriever Class — langchain Architecture

Architecture documentation for the BaseRetriever class in retrievers.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  2a401977_bd56_ea94_9c8f_d0b77072baae["BaseRetriever"]
  b0b3cb2f_fcef_0784_b2ba_ee476260390d["retrievers.py"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|defined in| b0b3cb2f_fcef_0784_b2ba_ee476260390d
  7bd8fc55_79b7_1216_7d63_38ad5e811319["__init_subclass__()"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|method| 7bd8fc55_79b7_1216_7d63_38ad5e811319
  a0405867_c11c_f2da_fe57_9354b5df3dae["_get_ls_params()"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|method| a0405867_c11c_f2da_fe57_9354b5df3dae
  3c2460f7_e200_ef96_b678_5bff6596750c["invoke()"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|method| 3c2460f7_e200_ef96_b678_5bff6596750c
  b95b5f52_34c2_ef9f_6464_67011efbcfeb["ainvoke()"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|method| b95b5f52_34c2_ef9f_6464_67011efbcfeb
  a59a35e6_2ae6_62d7_e2ca_10c889294d10["_get_relevant_documents()"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|method| a59a35e6_2ae6_62d7_e2ca_10c889294d10
  d293118c_0838_825b_13f2_7fc42815a34e["_aget_relevant_documents()"]
  2a401977_bd56_ea94_9c8f_d0b77072baae -->|method| d293118c_0838_825b_13f2_7fc42815a34e

Relationship Graph

Source Code

libs/core/langchain_core/retrievers.py lines 55–328

class BaseRetriever(RunnableSerializable[RetrieverInput, RetrieverOutput], ABC):
    """Abstract base class for a document retrieval system.

    A retrieval system is defined as something that can take string queries and return
    the most 'relevant' documents from some source.

    Usage:

    A retriever follows the standard `Runnable` interface, and should be used via the
    standard `Runnable` methods of `invoke`, `ainvoke`, `batch`, `abatch`.

    Implementation:

    When implementing a custom retriever, the class should implement the
    `_get_relevant_documents` method to define the logic for retrieving documents.

    Optionally, an async native implementations can be provided by overriding the
    `_aget_relevant_documents` method.

    !!! example "Retriever that returns the first 5 documents from a list of documents"

        ```python
        from langchain_core.documents import Document
        from langchain_core.retrievers import BaseRetriever

        class SimpleRetriever(BaseRetriever):
            docs: list[Document]
            k: int = 5

            def _get_relevant_documents(self, query: str) -> list[Document]:
                \"\"\"Return the first k documents from the list of documents\"\"\"
                return self.docs[:self.k]

            async def _aget_relevant_documents(self, query: str) -> list[Document]:
                \"\"\"(Optional) async native implementation.\"\"\"
                return self.docs[:self.k]
        ```

    !!! example "Simple retriever based on a scikit-learn vectorizer"

        ```python
        from sklearn.metrics.pairwise import cosine_similarity


        class TFIDFRetriever(BaseRetriever, BaseModel):
            vectorizer: Any
            docs: list[Document]
            tfidf_array: Any
            k: int = 4

            class Config:
                arbitrary_types_allowed = True

            def _get_relevant_documents(self, query: str) -> list[Document]:
                # Ip -- (n_docs,x), Op -- (n_docs,n_Feats)
                query_vec = self.vectorizer.transform([query])
                # Op -- (n_docs,1) -- Cosine Sim with each doc
                results = cosine_similarity(self.tfidf_array, query_vec).reshape((-1,))
                return [self.docs[i] for i in results.argsort()[-self.k :][::-1]]
        ```
    """

    model_config = ConfigDict(
        arbitrary_types_allowed=True,
    )

    _new_arg_supported: bool = False

    _expects_other_args: bool = False

    tags: list[str] | None = None
    """Optional list of tags associated with the retriever.

    These tags will be associated with each call to this retriever,
    and passed as arguments to the handlers defined in `callbacks`.

    You can use these to eg identify a specific instance of a retriever with its
    use case.
    """

    metadata: dict[str, Any] | None = None

Frequently Asked Questions

What is the BaseRetriever class?
BaseRetriever is a class in the langchain codebase, defined in libs/core/langchain_core/retrievers.py.
Where is BaseRetriever defined?
BaseRetriever is defined in libs/core/langchain_core/retrievers.py at line 55.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free