Sign in Subscribe

Topic

ClinDet-Bench

A collection of 1 issue

New Benchmarks Emerge for Evaluating AI Agents in Real-World Scenarios

New benchmarks, including MobilityBench, AMA-Bench, and ClinDet-Bench, have emerged to address gaps in evaluating AI agents in real-world scenarios. These benchmarks focus on route-planning, long-horizon memory, and clinical decision-making, respectively. They aim to improve the robustness and