Jailbreaking Large Language Models: Disguise and Reconstruction Attack (DRA)
Explores how DRA exploits biases in LLM fine-tuning to bypass safety measures with minimal queries, achieving state-of-the-art jailbreak success.
Explores how DRA exploits biases in LLM fine-tuning to bypass safety measures with minimal queries, achieving state-of-the-art jailbreak success.
This paper investigates remote code execution (RCE) vulnerabilities in applications that integrate large language models.