New Defense Thwarts Attacks on AI Language Agents
ICON, a new defense mechanism, effectively neutralizes Indirect Prompt Injection (IPI) attacks on Large Language Model (LLM) agents. By leveraging a probing-to-mitigation framework, ICON achieves a competitive 0.4% Attack Success Rate (ASR) and a significant 50% task utility gain. This ensures