New Benchmarks Emerge for Evaluating AI Agents in Real-World Scenarios
New benchmarks, including MobilityBench, AMA-Bench, and ClinDet-Bench, have emerged to address gaps in evaluating AI agents in real-world scenarios. These benchmarks focus on route-planning, long-horizon memory, and clinical decision-making, respectively. They aim to improve the robustness and