TestResult Class — langchain Architecture
Architecture documentation for the TestResult class in runner_utils.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 2faa7d51_38e2_8d63_1f1e_b482b4401e76["TestResult"] 8253c602_7d0c_9195_a7e1_3e9b19304131["runner_utils.py"] 2faa7d51_38e2_8d63_1f1e_b482b4401e76 -->|defined in| 8253c602_7d0c_9195_a7e1_3e9b19304131 e45d7c4e_b434_e04f_5d6a_822b3cb84b5a["get_aggregate_feedback()"] 2faa7d51_38e2_8d63_1f1e_b482b4401e76 -->|method| e45d7c4e_b434_e04f_5d6a_822b3cb84b5a c8c431c9_edb6_7a8a_6dd8_508fa675ba6c["to_dataframe()"] 2faa7d51_38e2_8d63_1f1e_b482b4401e76 -->|method| c8c431c9_edb6_7a8a_6dd8_508fa675ba6c
Relationship Graph
Source Code
libs/langchain/langchain_classic/smith/evaluation/runner_utils.py lines 80–149
class TestResult(dict):
"""A dictionary of the results of a single test run."""
def get_aggregate_feedback(
self,
) -> pd.DataFrame:
"""Return quantiles for the feedback scores.
This method calculates and prints the quantiles for the feedback scores
across all feedback keys.
Returns:
A DataFrame containing the quantiles for each feedback key.
"""
df = self.to_dataframe()
# Drop all things starting with inputs., outputs., and reference
to_drop = [
col
for col in df.columns
if col.startswith(("inputs.", "outputs.", "reference"))
or col in {"input", "output"}
]
return df.describe(include="all").drop(to_drop, axis=1)
def to_dataframe(self) -> pd.DataFrame:
"""Convert the results to a dataframe."""
try:
import pandas as pd
except ImportError as e:
msg = (
"Pandas is required to convert the results to a dataframe."
" to install pandas, run `pip install pandas`."
)
raise ImportError(msg) from e
indices = []
records = []
for example_id, result in self["results"].items():
feedback = result["feedback"]
output_ = result.get("output")
if isinstance(output_, dict):
output = {f"outputs.{k}": v for k, v in output_.items()}
elif output_ is None:
output = {}
else:
output = {"output": output_}
r = {
**{f"inputs.{k}": v for k, v in result["input"].items()},
**output,
}
if "reference" in result:
if isinstance(result["reference"], dict):
r.update(
{f"reference.{k}": v for k, v in result["reference"].items()},
)
else:
r["reference"] = result["reference"]
r.update(
{
**{f"feedback.{f.key}": f.score for f in feedback},
"error": result.get("Error"),
"execution_time": result["execution_time"],
"run_id": result.get("run_id"),
},
)
records.append(r)
indices.append(example_id)
return pd.DataFrame(records, index=indices)
Source
Frequently Asked Questions
What is the TestResult class?
TestResult is a class in the langchain codebase, defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py.
Where is TestResult defined?
TestResult is defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py at line 80.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free