Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Analyzes prompt injection attacks in LLMs, evaluates their impact on different models, and benchmarks defenses like Known-Answer Detection.

February 7, 2025 · 4 min · Chengyu Zhang

Jailbreaking Large Language Models: Disguise and Reconstruction Attack (DRA)

Explores how DRA exploits biases in LLM fine-tuning to bypass safety measures with minimal queries, achieving state-of-the-art jailbreak success.

February 5, 2025 · 4 min · Chengyu Zhang

Style Transfer in Text - Exploration and Evaluation

This paper explores text style transfer (TST) using non-parallel data, introducing new evaluation metrics for transfer strength and content preservation.

January 22, 2025 · 3 min · Chengyu Zhang

Text Style Transfer - A Review and Experimental Evaluation

A comprehensive review of text style transfer (TST) techniques, their evaluation, and benchmarking results across various datasets.

January 21, 2025 · 16 min · Chengyu Zhang

Does Label Differential Privacy Prevent Label Inference Attacks?

Analyzes the effectiveness of label-DP in mitigating label inference attacks and provides insights on privacy settings and attack bounds.

October 11, 2024 · 2 min · Chengyu Zhang

Using LLMs to Uncover Memorization in Instruction-Tuned Models

A study introducing a black-box prompt optimization approach to uncover higher levels of memorization in instruction-tuned LLMs.

October 11, 2024 · 2 min · Chengyu Zhang

Defending Batch-Level Label Inference and Replacement Attacks in Vertical Federated Learning

Explores vulnerabilities in VFL models to label inference and backdoor attacks and proposes effective defenses like CAE and DCAE.

October 7, 2024 · 2 min · Chengyu Zhang

Label Inference Attacks Against Vertical Federated Learning

Evaluates privacy risks of vertical federated learning (VFL) and proposes label inference attacks with outstanding performance, highlighting vulnerabilities and defense limitations.

September 16, 2024 · 2 min · Chengyu Zhang

Do Membership Inference Attacks Work on Large Language Models?

This paper evaluates the effectiveness of membership inference attacks on large language models, revealing that such attacks often perform no better than random guessing.

June 14, 2024 · 2 min · Chengyu Zhang

Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models - A Pilot Study

An investigation of membership inference attacks (MIAs) targeting large-scale multi-modal models like CLIP, introducing practical attack strategies without shadow training.

January 24, 2024 · 2 min · Chengyu Zhang