Home / Class/ TestResult Class — langchain Architecture

TestResult Class — langchain Architecture

Architecture documentation for the TestResult class in runner_utils.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  2faa7d51_38e2_8d63_1f1e_b482b4401e76["TestResult"]
  8253c602_7d0c_9195_a7e1_3e9b19304131["runner_utils.py"]
  2faa7d51_38e2_8d63_1f1e_b482b4401e76 -->|defined in| 8253c602_7d0c_9195_a7e1_3e9b19304131
  e45d7c4e_b434_e04f_5d6a_822b3cb84b5a["get_aggregate_feedback()"]
  2faa7d51_38e2_8d63_1f1e_b482b4401e76 -->|method| e45d7c4e_b434_e04f_5d6a_822b3cb84b5a
  c8c431c9_edb6_7a8a_6dd8_508fa675ba6c["to_dataframe()"]
  2faa7d51_38e2_8d63_1f1e_b482b4401e76 -->|method| c8c431c9_edb6_7a8a_6dd8_508fa675ba6c

Relationship Graph

Source Code

libs/langchain/langchain_classic/smith/evaluation/runner_utils.py lines 80–149

class TestResult(dict):
    """A dictionary of the results of a single test run."""

    def get_aggregate_feedback(
        self,
    ) -> pd.DataFrame:
        """Return quantiles for the feedback scores.

        This method calculates and prints the quantiles for the feedback scores
        across all feedback keys.

        Returns:
            A DataFrame containing the quantiles for each feedback key.
        """
        df = self.to_dataframe()
        # Drop all things starting with inputs., outputs., and reference
        to_drop = [
            col
            for col in df.columns
            if col.startswith(("inputs.", "outputs.", "reference"))
            or col in {"input", "output"}
        ]
        return df.describe(include="all").drop(to_drop, axis=1)

    def to_dataframe(self) -> pd.DataFrame:
        """Convert the results to a dataframe."""
        try:
            import pandas as pd
        except ImportError as e:
            msg = (
                "Pandas is required to convert the results to a dataframe."
                " to install pandas, run `pip install pandas`."
            )
            raise ImportError(msg) from e

        indices = []
        records = []
        for example_id, result in self["results"].items():
            feedback = result["feedback"]
            output_ = result.get("output")
            if isinstance(output_, dict):
                output = {f"outputs.{k}": v for k, v in output_.items()}
            elif output_ is None:
                output = {}
            else:
                output = {"output": output_}

            r = {
                **{f"inputs.{k}": v for k, v in result["input"].items()},
                **output,
            }
            if "reference" in result:
                if isinstance(result["reference"], dict):
                    r.update(
                        {f"reference.{k}": v for k, v in result["reference"].items()},
                    )
                else:
                    r["reference"] = result["reference"]
            r.update(
                {
                    **{f"feedback.{f.key}": f.score for f in feedback},
                    "error": result.get("Error"),
                    "execution_time": result["execution_time"],
                    "run_id": result.get("run_id"),
                },
            )
            records.append(r)
            indices.append(example_id)

        return pd.DataFrame(records, index=indices)

Frequently Asked Questions

What is the TestResult class?
TestResult is a class in the langchain codebase, defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py.
Where is TestResult defined?
TestResult is defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py at line 80.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free