

Hello, this is
Zhangchen Xu (徐张晨).
Bio
I am a third-year PhD student at Network Security Lab at the University of Washington, advised by Prof. Radha Poovendran. I’m also a part-time research intern at Microsoft GenAI, working with Dr. Yang Liu. Prior to joining UW, I completed a joint B.E. in Communication Engineering from the University of Electronic Science and Technology of China (UESTC) and the University of Glasgow (UofG). During my undergrad, I was advised by Prof. Lei Zhang.
My email -> zxu9 [a-t] uw [d-o-t] edu. Feel free to reach out if you would like to discuss Synthetic Data, Safety, and Post-training of LLMs, SLMs and VLMs.
Research Interests
I work on Generative AI, with a current focus on the post-training of large language models (LLMs). My current research directions include:
Synthetic Data
I conduct data-centric research focused on enhancing LLMs through post-training with synthetic data.
- 🦅 Magpie is a family of SOTA synthetic datasets for LLM alignment.
- 🐦 MagpieLM models are SOTA small language models for chat.
- 🐱 KodCode is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for LLM coding.
In addition, I am interested in distilling capabilities from powerful LLMs to more efficient smaller models. My analysis papers in this area include:
- Larger Models’ Paradox examines the impact of response generators for LLM alignment.
- Small Model Learnability Gap investigate how to let small models (≤3B parameters) benefit from long chain-of-thought (CoT) reasoning via distillation.
LLM Safety
I investigate emerging security threats in LLMs and explore defense mechanisms. I’m particularly interested in inference-time defenses.
- SafeDecoding is an inference-time defense against jailbreak attacks.
- CleanGen mitigates backdoor attacks for generation tasks in inference time.
- Shield defends against prompt-injection attacks in LLM-integrated Apps using cryptography.
I have also been working on distributed algorithms:
Federated Learning. Work includes ACE (contribution evaluation attack) and Brave.
Distributed Consensus. Work includes Voting Validity, Wireless Distributed Consensus, and Distributed Consensus Network.
(see here for full publication list)
Selected WorkKodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Zhangchen Xu, Yang Liu, Yueqin Yin, Mingyuan Zhou, Radha Poovendran
Arxiv / Website / Huggingface / Code /
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran
NAACL 2025 (Main) | Paper