weighted_reciprocal_rank() — langchain Function Reference

Architecture documentation for the weighted_reciprocal_rank() function in ensemble.py from the langchain codebase.

Function python LangChainCore ApiManagement calls 1 called by 2

Entity Profile

LangChainCore→ ApiManagement→ weighted_reciprocal_rank() — langchain Function Reference

Dependency Diagram

graph TD
  e4787291_6959_3384_643b_10f54ba9483a["weighted_reciprocal_rank()"]
  b484cd3a_bbd0_4ff6_dc8c_3fc1ac219bca["EnsembleRetriever"]
  e4787291_6959_3384_643b_10f54ba9483a -->|defined in| b484cd3a_bbd0_4ff6_dc8c_3fc1ac219bca
  846fcd62_7844_5645_0197_3c181518377e["rank_fusion()"]
  846fcd62_7844_5645_0197_3c181518377e -->|calls| e4787291_6959_3384_643b_10f54ba9483a
  f22e315b_c302_5414_8dd3_04bde7d630dd["arank_fusion()"]
  f22e315b_c302_5414_8dd3_04bde7d630dd -->|calls| e4787291_6959_3384_643b_10f54ba9483a
  f67f0af0_2f5f_5c77_60c6_145c0fe80662["unique_by_key()"]
  e4787291_6959_3384_643b_10f54ba9483a -->|calls| f67f0af0_2f5f_5c77_60c6_145c0fe80662
  style e4787291_6959_3384_643b_10f54ba9483a fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/langchain/langchain_classic/retrievers/ensemble.py lines 288–336

    def weighted_reciprocal_rank(
        self,
        doc_lists: list[list[Document]],
    ) -> list[Document]:
        """Perform weighted Reciprocal Rank Fusion on multiple rank lists.

        You can find more details about RRF here:
        https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf.

        Args:
            doc_lists: A list of rank lists, where each rank list contains unique items.

        Returns:
            The final aggregated list of items sorted by their weighted RRF
            scores in descending order.
        """
        if len(doc_lists) != len(self.weights):
            msg = "Number of rank lists must be equal to the number of weights."
            raise ValueError(msg)

        # Associate each doc's content with its RRF score for later sorting by it
        # Duplicated contents across retrievers are collapsed & scored cumulatively
        rrf_score: dict[str, float] = defaultdict(float)
        for doc_list, weight in zip(doc_lists, self.weights, strict=False):
            for rank, doc in enumerate(doc_list, start=1):
                rrf_score[
                    (
                        doc.page_content
                        if self.id_key is None
                        else doc.metadata[self.id_key]
                    )
                ] += weight / (rank + self.c)

        # Docs are deduplicated by their contents then sorted by their scores
        all_docs = chain.from_iterable(doc_lists)
        return sorted(
            unique_by_key(
                all_docs,
                lambda doc: (
                    doc.page_content
                    if self.id_key is None
                    else doc.metadata[self.id_key]
                ),
            ),
            reverse=True,
            key=lambda doc: rrf_score[
                doc.page_content if self.id_key is None else doc.metadata[self.id_key]
            ],
        )