hi@zhangchen
_
avatar avatar

Hello, this is

Zhangchen Xu (徐张晨).

Bio

I am a third-year PhD student at Network Security Lab at the University of Washington, advised by Prof. Radha Poovendran. I’m also a part-time research intern at Microsoft GenAI, working with Dr. Yang Liu. Prior to UW, I graduated from University of Electronic Science and Technology of China (UESTC) and University of Glasgow (UofG) with a B.E. in Communication Engineering. During my undergrad, I was advised by Prof. Lei Zhang.

My email -> zxu9 [a-t] uw [d-o-t] edu

I am open to collaboration! Feel free to reach out if you would like to discuss Safety, Synthetic Data, and Post-training of LLMs, SLMs and VLMs

Research Interests

My primary interests lie broadly in machine learning, networking, and security, with a current focus on the safety and alignment of large language models (LLMs). My current research directions include:

LLM Safety

I investigate emerging security threats in LLMs and explore defense mechanisms. I’m particularly interested in inference-time defenses.

  • SafeDecoding is an inference-time defense against jailbreak attacks.
  • CleanGen mitigates backdoor attacks for generation tasks in inference time.
  • Shield defends against prompt-injection attacks in LLM-integrated Apps using cryptography.

LLM Alignment

I train language models to be more helpful and better align with human values with synthetic data. My current research focuses on distilling capabilities from powerful LLMs to smaller ones.

  • Magpie is a framework that creates SOTA synthetic datasets from open-source LLMs.
  • MagpieLM models are SOTA small language models for chat.
  • Larger Models’ Paradox investigates the impact of response generators for synthetic dataset generation.

Distributed Algorithms

Federated Learning. Work includes ACE (contribution evaluation attack) and Brave.

Distributed Consensus. Work includes Voting Validity, Wireless Distributed Consensus, and Distributed Consensus Network.

Selected Work (see here for full publication list)

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

NAACL 2025 (Main) | Paper

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin

ICLR 2025 | Paper / Website / Huggingface / Code / Demo / 新智元

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Fengqing Jiang*, Zhangchen Xu*, Luyao Niu*, Bill Yuchen Lin, Radha Poovendran

AAAI 2025 | Paper / Code

ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bo Li, Radha Poovendran

Usenix Security 2024 | Paper / Slides

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

Yuetai Li*, Zhangchen Xu*, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran

EMNLP 2024 (Main) | Paper / Code

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bill Yuchen Lin, Radha Poovendran

ACL 2024 (Main, Oral) | Paper / Code / Poster / Slides

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Fengqing Jiang*, Zhangchen Xu*, Luyao Niu*, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

ACL 2024 (Main) | Paper / Code / Poster