New Benchmark AMA-Bench Evaluates Long-Horizon Memory in AI Agents
AMA-Bench, a new benchmark, evaluates long-horizon memory in Large Language Model (LLM) agents by assessing continuous agent-environment interactions. The study reveals that existing memory systems underperform due to a lack of causality and objective information. AMA-Agent, a proposed memory s