_evaluate_string_pairs() — langchain Function Reference

Architecture documentation for the _evaluate_string_pairs() function in eval_chain.py from the langchain codebase.

Function python LangChainCore ApiManagement calls 2

Entity Profile

LangChainCore→ ApiManagement→ _evaluate_string_pairs() — langchain Function Reference

Dependency Diagram

graph TD
  69c11dee_1bd0_daca_16bd_24a2df0fec66["_evaluate_string_pairs()"]
  6997d03c_6524_f97b_7017_b2f56540bc07["PairwiseStringEvalChain"]
  69c11dee_1bd0_daca_16bd_24a2df0fec66 -->|defined in| 6997d03c_6524_f97b_7017_b2f56540bc07
  ae00f73a_3274_a24c_aed1_e45be85f0fdd["_prepare_input()"]
  69c11dee_1bd0_daca_16bd_24a2df0fec66 -->|calls| ae00f73a_3274_a24c_aed1_e45be85f0fdd
  8d13c038_c9d1_0607_fe88_3a2eaf8195b9["_prepare_output()"]
  69c11dee_1bd0_daca_16bd_24a2df0fec66 -->|calls| 8d13c038_c9d1_0607_fe88_3a2eaf8195b9
  style 69c11dee_1bd0_daca_16bd_24a2df0fec66 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/langchain/langchain_classic/evaluation/comparison/eval_chain.py lines 319–362

    def _evaluate_string_pairs(
        self,
        *,
        prediction: str,
        prediction_b: str,
        input: str | None = None,
        reference: str | None = None,
        callbacks: Callbacks = None,
        tags: list[str] | None = None,
        metadata: dict[str, Any] | None = None,
        include_run_info: bool = False,
        **kwargs: Any,
    ) -> dict:
        """Evaluate whether output A is preferred to output B.

        Args:
            prediction: The output string from the first model.
            prediction_b: The output string from the second model.
            input: The input or task string.
            callbacks: The callbacks to use.
            tags: The tags to apply.
            metadata: The metadata to use.
            include_run_info: Whether to include run info in the output.
            reference: The reference string, if any.
            **kwargs: Additional keyword arguments.

        Returns:
            `dict` containing:
                - reasoning: The reasoning for the preference.
                - value: The preference value, which is either 'A', 'B', or None
                    for no preference.
                - score: The preference score, which is 1 for 'A', 0 for 'B',
                    and 0.5 for None.

        """
        input_ = self._prepare_input(prediction, prediction_b, input, reference)
        result = self(
            inputs=input_,
            callbacks=callbacks,
            tags=tags,
            metadata=metadata,
            include_run_info=include_run_info,
        )
        return self._prepare_output(result)