Jailbreaking Large Language Models: Disguise and Reconstruction Attack (DRA)

Explores how DRA exploits biases in LLM fine-tuning to bypass safety measures with minimal queries, achieving state-of-the-art jailbreak success.

February 5, 2025 · 4 min · Chengyu Zhang

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

This paper provides a formal framework and benchmarking methodology for prompt injection attacks and their countermeasures in large language models.

January 31, 2025 · 1 min · Chengyu Zhang