Home / Class/ _DatasetRunContainer Class — langchain Architecture

_DatasetRunContainer Class — langchain Architecture

Architecture documentation for the _DatasetRunContainer class in runner_utils.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909["_DatasetRunContainer"]
  81476294_823b_1810_f20c_d97f2fdbfa71["EvalError"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|extends| 81476294_823b_1810_f20c_d97f2fdbfa71
  d98d30f4_d5fd_24fc_54d0_e2f82eecc3cd["EvaluatorCallbackHandler"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|extends| d98d30f4_d5fd_24fc_54d0_e2f82eecc3cd
  178590bb_85ff_b79e_979a_46e5c3c5389f["LangChainTracer"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|extends| 178590bb_85ff_b79e_979a_46e5c3c5389f
  8253c602_7d0c_9195_a7e1_3e9b19304131["runner_utils.py"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|defined in| 8253c602_7d0c_9195_a7e1_3e9b19304131
  bb416a9d_32e6_298b_d0bd_82d7d53d9821["_merge_test_outputs()"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| bb416a9d_32e6_298b_d0bd_82d7d53d9821
  31505a45_4c7a_ea20_3a86_a98f76b1d2b7["_run_batch_evaluators()"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 31505a45_4c7a_ea20_3a86_a98f76b1d2b7
  983ed1c6_8485_7927_a832_f9e88ee9bb16["_collect_metrics()"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 983ed1c6_8485_7927_a832_f9e88ee9bb16
  7bd0a459_a7f0_719c_faf9_2cf0ffd65a8c["_collect_test_results()"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 7bd0a459_a7f0_719c_faf9_2cf0ffd65a8c
  f2fb82ef_40a0_07e3_1d8e_3a52a5a502ce["finish()"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| f2fb82ef_40a0_07e3_1d8e_3a52a5a502ce
  00d82cfb_ba59_4f67_e504_1faad0617f06["prepare()"]
  3aaa6e94_b6a8_1c13_86d0_1709a1d93909 -->|method| 00d82cfb_ba59_4f67_e504_1faad0617f06

Relationship Graph

Source Code

libs/langchain/langchain_classic/smith/evaluation/runner_utils.py lines 1094–1293

class _DatasetRunContainer:
    """A container to help manage the state of a eval run."""

    client: Client
    project: TracerSession
    wrapped_model: MCF
    examples: list[Example]
    configs: list[RunnableConfig]
    batch_evaluators: list[smith_eval_config.BATCH_EVALUATOR_LIKE] | None = None

    def _merge_test_outputs(
        self,
        batch_results: list,
        all_eval_results: dict[str, _RowResult],
    ) -> dict:
        results: dict = {}
        for example, output in zip(self.examples, batch_results, strict=False):
            row_result = all_eval_results.get(str(example.id), {})
            results[str(example.id)] = {
                "input": example.inputs,
                "feedback": row_result.get("feedback", []),
                "execution_time": row_result.get("execution_time"),
                "run_id": row_result.get("run_id"),
            }
            if isinstance(output, EvalError):
                results[str(example.id)]["Error"] = output.Error
            else:
                results[str(example.id)]["output"] = output
            if example.outputs:
                results[str(example.id)]["reference"] = example.outputs
        return results

    def _run_batch_evaluators(self, runs: dict[str, Run]) -> list[dict]:
        evaluators = self.batch_evaluators
        if not evaluators:
            return []
        runs_list = [runs[str(example.id)] for example in self.examples]
        aggregate_feedback = []
        with concurrent.futures.ThreadPoolExecutor() as executor:
            for evaluator in evaluators:
                try:
                    result = evaluator(runs_list, self.examples)
                    if isinstance(result, EvaluationResult):
                        result = result.model_dump()
                    aggregate_feedback.append(cast("dict", result))
                    executor.submit(
                        self.client.create_feedback,
                        **result,
                        run_id=None,
                        project_id=self.project.id,
                    )
                except Exception:
                    logger.exception(
                        "Error running batch evaluator %s", repr(evaluator)
                    )
        return aggregate_feedback

    def _collect_metrics(self) -> tuple[dict[str, _RowResult], dict[str, Run]]:
        all_eval_results: dict = {}
        all_runs: dict = {}
        for c in self.configs:
            for callback in cast("list", c["callbacks"]):
                if isinstance(callback, EvaluatorCallbackHandler):
                    eval_results = callback.logged_eval_results
                    for (_, example_id), v in eval_results.items():
                        all_eval_results.setdefault(str(example_id), {}).update(
                            {"feedback": v},
                        )
                elif isinstance(callback, LangChainTracer):
                    run = callback.latest_run
                    execution_time = (
                        (run.end_time - run.start_time).total_seconds()
                        if run and run.end_time
                        else None
                    )
                    run_id = str(run.id) if run else None
                    all_eval_results.setdefault(str(callback.example_id), {}).update(
                        {
                            "execution_time": execution_time,
                            "run_id": run_id,
                            "run": run,

Frequently Asked Questions

What is the _DatasetRunContainer class?
_DatasetRunContainer is a class in the langchain codebase, defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py.
Where is _DatasetRunContainer defined?
_DatasetRunContainer is defined in libs/langchain/langchain_classic/smith/evaluation/runner_utils.py at line 1094.
What does _DatasetRunContainer extend?
_DatasetRunContainer extends EvalError, EvaluatorCallbackHandler, LangChainTracer.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free