retrievers.py — langchain Source File

Architecture documentation for retrievers.py, a python file in the langchain codebase. 8 imports, 0 dependents.

File python LangChainCore ApiManagement 8 imports 1 functions 1 classes

Entity Profile

LangChainCore→ ApiManagement→ retrievers.py — langchain Source File

Dependency Diagram

graph LR
  adcb3d5b_1090_8f27_8ae7_b26d830eb673["retrievers.py"]
  feec1ec4_6917_867b_d228_b134d0ff8099["typing"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> feec1ec4_6917_867b_d228_b134d0ff8099
  0344251d_a425_177d_d810_f45aa8de9600["exa_py"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> 0344251d_a425_177d_d810_f45aa8de9600
  5d33df87_33f9_172a_2864_dd2e31881c5b["exa_py.api"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> 5d33df87_33f9_172a_2864_dd2e31881c5b
  17a62cb3_fefd_6320_b757_b53bb4a1c661["langchain_core.callbacks"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> 17a62cb3_fefd_6320_b757_b53bb4a1c661
  6a98b0a5_5607_0043_2e22_a46a464c2d62["langchain_core.documents"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> 6a98b0a5_5607_0043_2e22_a46a464c2d62
  2b1aa4a8_5352_1757_010a_46ac9ef4b0b0["langchain_core.retrievers"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> 2b1aa4a8_5352_1757_010a_46ac9ef4b0b0
  dd5e7909_a646_84f1_497b_cae69735550e["pydantic"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> dd5e7909_a646_84f1_497b_cae69735550e
  ad09f074_c715_9e1c_1a2c_aaa919862b80["langchain_exa._utilities"]
  adcb3d5b_1090_8f27_8ae7_b26d830eb673 --> ad09f074_c715_9e1c_1a2c_aaa919862b80
  style adcb3d5b_1090_8f27_8ae7_b26d830eb673 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Retriever using Exa Search API."""

from __future__ import annotations

from typing import Any, Literal

from exa_py import Exa  # type: ignore[untyped-import]
from exa_py.api import (
    HighlightsContentsOptions,  # type: ignore[untyped-import]
    TextContentsOptions,  # type: ignore[untyped-import]
)
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from pydantic import Field, SecretStr, model_validator

from langchain_exa._utilities import initialize_client


def _get_metadata(result: Any) -> dict[str, Any]:
    """Get the metadata from a result object."""
    metadata = {
        "title": result.title,
        "url": result.url,
        "id": result.id,
        "score": result.score,
        "published_date": result.published_date,
        "author": result.author,
    }
    if getattr(result, "highlights"):
        metadata["highlights"] = result.highlights
    if getattr(result, "highlight_scores"):
        metadata["highlight_scores"] = result.highlight_scores
    if getattr(result, "summary"):
        metadata["summary"] = result.summary
    return metadata


class ExaSearchRetriever(BaseRetriever):
    """Exa Search retriever."""

    k: int = 10  # num_results
    """The number of search results to return (1 to 100)."""
    include_domains: list[str] | None = None
    """A list of domains to include in the search."""
    exclude_domains: list[str] | None = None
    """A list of domains to exclude from the search."""
    start_crawl_date: str | None = None
    """The start date for the crawl (in YYYY-MM-DD format)."""
    end_crawl_date: str | None = None
    """The end date for the crawl (in YYYY-MM-DD format)."""
    start_published_date: str | None = None
    """The start date for when the document was published (in YYYY-MM-DD format)."""
    end_published_date: str | None = None
    """The end date for when the document was published (in YYYY-MM-DD format)."""
    use_autoprompt: bool | None = None
    """Whether to use autoprompt for the search."""
    type: str = "neural"
    """The type of search, 'keyword', 'neural', or 'auto'. Default: neural"""
    highlights: HighlightsContentsOptions | bool | None = None
    """Whether to set the page content to the highlights of the results."""
    text_contents_options: TextContentsOptions | dict[str, Any] | Literal[True] = True
    """How to set the page content of the results. Can be True or a dict with options
    like max_characters."""
    livecrawl: Literal["always", "fallback", "never"] | None = None
    """Option to crawl live webpages if content is not in the index. Options: "always",
    "fallback", "never"."""
    summary: bool | dict[str, str] | None = None
    """Whether to include a summary of the content. Can be a boolean or a dict with a
    custom query."""

    client: Exa = Field(default=None)  # type: ignore[assignment]
    exa_api_key: SecretStr = Field(default=SecretStr(""))
    exa_base_url: str | None = None

    @model_validator(mode="before")
    @classmethod
    def validate_environment(cls, values: dict) -> Any:
        """Validate the environment."""
        return initialize_client(values)

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> list[Document]:
        response = self.client.search_and_contents(  # type: ignore[call-overload]
            query,
            num_results=self.k,
            text=self.text_contents_options,
            highlights=self.highlights,
            include_domains=self.include_domains,
            exclude_domains=self.exclude_domains,
            start_crawl_date=self.start_crawl_date,
            end_crawl_date=self.end_crawl_date,
            start_published_date=self.start_published_date,
            end_published_date=self.end_published_date,
            use_autoprompt=self.use_autoprompt,
            livecrawl=self.livecrawl,
            summary=self.summary,
            type=self.type,
        )  # type: ignore[call-overload, misc]

        results = response.results

        return [
            Document(
                page_content=(result.text),
                metadata=_get_metadata(result),
            )
            for result in results
        ]

Domain

LangChainCore

Subdomains

ApiManagement

Functions

_get_metadata()

Classes

ExaSearchRetriever

Dependencies

exa_py
exa_py.api
langchain_core.callbacks
langchain_core.documents
langchain_core.retrievers
langchain_exa._utilities
pydantic
typing

Source

View on GitHub

Frequently Asked Questions

What does retrievers.py do?

retrievers.py is a source file in the langchain codebase, written in python. It belongs to the LangChainCore domain, ApiManagement subdomain.

What functions are defined in retrievers.py?

retrievers.py defines 1 function(s): _get_metadata.

What does retrievers.py depend on?

retrievers.py imports 8 module(s): exa_py, exa_py.api, langchain_core.callbacks, langchain_core.documents, langchain_core.retrievers, langchain_exa._utilities, pydantic, typing.

Where is retrievers.py in the architecture?

retrievers.py is located at libs/partners/exa/langchain_exa/retrievers.py (domain: LangChainCore, subdomain: ApiManagement, directory: libs/partners/exa/langchain_exa).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free