Large Language Model

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Analyzes prompt injection attacks in LLMs, evaluates their impact on different models, and benchmarks defenses like Known-Answer Detection.

Jailbreaking Large Language Models: Disguise and Reconstruction Attack (DRA)

Explores how DRA exploits biases in LLM fine-tuning to bypass safety measures with minimal queries, achieving state-of-the-art jailbreak success.

Using LLMs to Uncover Memorization in Instruction-Tuned Models

A study introducing a black-box prompt optimization approach to uncover higher levels of memorization in instruction-tuned LLMs.

Do Membership Inference Attacks Work on Large Language Models?

This paper evaluates the effectiveness of membership inference attacks on large language models, revealing that such attacks often perform no better than random guessing.

Membership Inference Attacks Against Fine-tuned Large Language Models via Self-prompt Calibration

A novel study introducing self-prompt calibration for membership inference attacks (MIAs) against fine-tuned large language models, improving reliability and practicality in privacy assessments.