Sign in Subscribe

Topic

AI Safety

A collection of 4 issues

AI Safety Frameworks Adapt to Evolving Regulations

CourtGuard introduces a model-agnostic framework for zero-shot policy adaptation in LLM safety, addressing the rigidity of static safety mechanisms. The framework reimagines safety evaluation as Evidentiary Debate, orchestrating an adversarial debate grounded in external policy documents. This

CourtGuard Framework Enhances LLM Safety Through Evidentiary Debate

The CourtGuard framework enhances the safety of Large Language Models (LLMs) by reimagining safety evaluation as Evidentiary Debate. The model-agnostic system enforces AI governance rules without retraining, achieving state-of-the-art performance across multiple safety benchmarks. CourtGuard's

Semantic Sensitive Information Poses New Threat in LLMs

Large Language Models (LLMs) are vulnerable to Semantic Sensitive Information (SemSI), where sensitive data can be inferred. Research introduces SemSIEdit, a framework that mitigates this by iteratively rewriting sensitive spans, achieving a 34.6% reduction in leakage with a 9.8% utility loss.

LLMs Face Limitations in Critical Applications, Research Shows

Recent research highlights limitations of Large Language Models (LLMs) in critical applications, including hallucinations and brittleness. Studies propose mitigation strategies like neuro-symbolic inference and temporal alignment to enhance model reliability. Researchers are exploring the impli