hi@zhangchen
_
avatar avatar

Hello, this is

Zhangchen Xu (徐张晨).

Bio

I am a third-year PhD student at Network Security Lab at the University of Washington, advised by Prof. Radha Poovendran. I am also a part-time research intern at Microsoft GenAI. Prior to UW, I graduated from University of Electronic Science and Technology of China (UESTC) and University of Glasgow (UofG) with a B.E. in Communication Engineering. During my undergrad, I was advised by Prof. Lei Zhang.

My email -> zxu9 [at] uw [dot] edu

Research Interests

My primary interests lie broadly in the fields of machine learning, networking, and security, with a current focus on the safety and alignment of large language models (LLMs). My current research directions include:

  • LLM Safety. I investigate emerging security threats in LLMs and explore defense mechanisms. I’m particularly interested in inference-time defenses, including: SafeDecoding (for jailbreaking), CleanGen (for backdoor), Shield (for LLM-integrated Apps).
  • LLM Alignment. I train LLMs to be more helpful and better align with human values with synthetic data. I developed Magpie datasets (SOTA synthetic datasets for LLM alignment!) & MagpieLM models (SOTA small language models!).
  • Federated Learning. Security, privacy and fairness in large-scale Federated Learning systems. Work includes ACE (contribution evaluation attack) and Brave.

During my undergraduate studies, my research focused on the theory & algorithm of distributed consensus. Work includes Voting Validity, Wireless Distributed Consensus, and Distributed Consensus Network.

Recent Papers (see here for full publication list)

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin

Arxiv | Paper / Website / Huggingface / Code / Demo / 新智元

ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bo Li, Radha Poovendran

Usenix Security 2024 | Paper / Slides

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bill Yuchen Lin, Radha Poovendran

ACL 2024 (Main, Oral) | Paper / Code / Poster / Slides

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Fengqing Jiang*, Zhangchen Xu*, Luyao Niu*, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

ACL 2024 (Main) | Paper / Code / Poster

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

Yuetai Li*, Zhangchen Xu*, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran

EMNLP 2024 (Main) | Paper / Code