_DatasetRunContainer Class — langchain Architecture
Architecture documentation for the _DatasetRunContainer class in runner_utils.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 3aaa6e94_b6a8_1c13_86d0_1709a1d93909["_DatasetRunContainer"] 81476294_823b_1810_f20c_d97f2fdbfa71["EvalError"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|extends| 81476294_823b_1810_f20c_d97f2fdbfa71 d98d30f4_d5fd_24fc_54d0_e2f82eecc3cd["EvaluatorCallbackHandler"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|extends| d98d30f4_d5fd_24fc_54d0_e2f82eecc3cd 178590bb_85ff_b79e_979a_46e5c3c5389f["LangChainTracer"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|extends| 178590bb_85ff_b79e_979a_46e5c3c5389f 8253c602_7d0c_9195_a7e1_3e9b19304131["runner_utils.py"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|defined in| 8253c602_7d0c_9195_a7e1_3e9b19304131 bb416a9d_32e6_298b_d0bd_82d7d53d9821["_merge_test_outputs()"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| bb416a9d_32e6_298b_d0bd_82d7d53d9821 31505a45_4c7a_ea20_3a86_a98f76b1d2b7["_run_batch_evaluators()"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 31505a45_4c7a_ea20_3a86_a98f76b1d2b7 983ed1c6_8485_7927_a832_f9e88ee9bb16["_collect_metrics()"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 983ed1c6_8485_7927_a832_f9e88ee9bb16 7bd0a459_a7f0_719c_faf9_2cf0ffd65a8c["_collect_test_results()"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 7bd0a459_a7f0_719c_faf9_2cf0ffd65a8c f2fb82ef_40a0_07e3_1d8e_3a52a5a502ce["finish()"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| f2fb82ef_40a0_07e3_1d8e_3a52a5a502ce 00d82cfb_ba59_4f67_e504_1faad0617f06["prepare()"] 3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 00d82cfb_ba59_4f67_e504_1faad0617f06
Relationship Graph
Source Code
libs/langchain/langchain_classic/smith/evaluation/runner_utils.py lines 1094–1293
class _DatasetRunContainer:
"""A container to help manage the state of a eval run."""
client: Client
project: TracerSession
wrapped_model: MCF
examples: list[Example]
configs: list[RunnableConfig]
batch_evaluators: list[smith_eval_config.BATCH_EVALUATOR_LIKE] | None = None
def _merge_test_outputs(
self,
batch_results: list,
all_eval_results: dict[str, _RowResult],
) -> dict:
results: dict = {}
for example, output in zip(self.examples, batch_results, strict=False):
row_result = all_eval_results.get(str(example.id), {})
results[str(example.id)] = {
"input": example.inputs,
"feedback": row_result.get("feedback", []),
"execution_time": row_result.get("execution_time"),
"run_id": row_result.get("run_id"),
}
if isinstance(output, EvalError):
results[str(example.id)]["Error"] = output.Error
else:
results[str(example.id)]["output"] = output
if example.outputs:
results[str(example.id)]["reference"] = example.outputs
return results
def _run_batch_evaluators(self, runs: dict[str, Run]) -> list[dict]:
evaluators = self.batch_evaluators
if not evaluators:
return []
runs_list = [runs[str(example.id)] for example in self.examples]
aggregate_feedback = []
with concurrent.futures.ThreadPoolExecutor() as executor:
for evaluator in evaluators:
try:
result = evaluator(runs_list, self.examples)
if isinstance(result, EvaluationResult):
result = result.model_dump()
aggregate_feedback.append(cast("dict", result))
executor.submit(
self.client.create_feedback,
**result,
run_id=None,
project_id=self.project.id,
)
except Exception:
logger.exception(
"Error running batch evaluator %s", repr(evaluator)
)
return aggregate_feedback
def _collect_metrics(self) -> tuple[dict[str, _RowResult], dict[str, Run]]:
all_eval_results: dict = {}
all_runs: dict = {}
for c in self.configs:
for callback in cast("list", c["callbacks"]):
if isinstance(callback, EvaluatorCallbackHandler):
eval_results = callback.logged_eval_results
for (_, example_id), v in eval_results.items():
all_eval_results.setdefault(str(example_id), {}).update(
{"feedback": v},
)
elif isinstance(callback, LangChainTracer):
run = callback.latest_run
execution_time = (
(run.end_time - run.start_time).total_seconds()
if run and run.end_time
else None
)
run_id = str(run.id) if run else None
all_eval_results.setdefault(str(callback.example_id), {}).update(
{
"execution_time": execution_time,
"run_id": run_id,
"run": run,
Source
Frequently Asked Questions
What is the _DatasetRunContainer class?
_DatasetRunContainer is a class in the langchain codebase, defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py.
Where is _DatasetRunContainer defined?
_DatasetRunContainer is defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py at line 1094.
What does _DatasetRunContainer extend?
_DatasetRunContainer extends EvalError, EvaluatorCallbackHandler, LangChainTracer.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free