Reading

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Analyzes prompt injection attacks in LLMs, evaluates their impact on different models, and benchmarks defenses like Known-Answer Detection.

Jailbreaking Large Language Models: Disguise and Reconstruction Attack (DRA)

Explores how DRA exploits biases in LLM fine-tuning to bypass safety measures with minimal queries, achieving state-of-the-art jailbreak success.

Style Transfer in Text - Exploration and Evaluation

This paper explores text style transfer (TST) using non-parallel data, introducing new evaluation metrics for transfer strength and content preservation.

Text Style Transfer - A Review and Experimental Evaluation

A comprehensive review of text style transfer (TST) techniques, their evaluation, and benchmarking results across various datasets.

Does Label Differential Privacy Prevent Label Inference Attacks?

Analyzes the effectiveness of label-DP in mitigating label inference attacks and provides insights on privacy settings and attack bounds.

Using LLMs to Uncover Memorization in Instruction-Tuned Models

A study introducing a black-box prompt optimization approach to uncover higher levels of memorization in instruction-tuned LLMs.

Defending Batch-Level Label Inference and Replacement Attacks in Vertical Federated Learning

Explores vulnerabilities in VFL models to label inference and backdoor attacks and proposes effective defenses like CAE and DCAE.

Label Inference Attacks Against Vertical Federated Learning

Evaluates privacy risks of vertical federated learning (VFL) and proposes label inference attacks with outstanding performance, highlighting vulnerabilities and defense limitations.

Do Membership Inference Attacks Work on Large Language Models?

This paper evaluates the effectiveness of membership inference attacks on large language models, revealing that such attacks often perform no better than random guessing.

Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models - A Pilot Study

An investigation of membership inference attacks (MIAs) targeting large-scale multi-modal models like CLIP, introducing practical attack strategies without shadow training.