MobilityBench Sets New Standard for Evaluating Route-Planning Agents
MobilityBench is a new benchmark for evaluating route-planning agents powered by LLMs. It uses real-world data from Amap and a deterministic testing environment. The benchmark reveals that while current models excel at basic tasks, they struggle with preference-constrained route planning, highl