* denotes equal contribution

Preprints & Under Review

  1. Distributed Consensus Network: A Universal Modularized Communication Framework and Reliability Probabilistic Analysis Yuetai Li , Zhangchen Xu , Zihan Zhou , Jon Crowcroft , Lei Zhang Under review
    @unpublished{li2023distributed,
      author = {Li, Yuetai and Xu, Zhangchen and Zhou, Zihan and Crowcroft, Jon and Zhang, Lei},
      title = {Distributed Consensus Network: A Universal Modularized Communication Framework and Reliability Probabilistic Analysis},
      year = {2024},
      note = {Under review}
    }
    
  2. CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models Yuetai Li , Zhangchen Xu , Fengqing Jiang , Luyao Niu , Dinuka Sahabandu , Bhaskar Ramasubramanian , Radha Poovendran
    @unpublished{li2024cleangen,
      title = {CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models},
      author = {Li, Yuetai and Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Sahabandu, Dinuka and Ramasubramanian, Bhaskar and Poovendran, Radha},
      year = {2024},
      eprint = {2406.12257},
      archiveprefix = {arXiv},
      code = {https://github.com/uw-nsl/CleanGen}
    }
    
    The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop a novel inference time defense, named CleanGen, to mitigate backdoor attacks for generation tasks in LLMs. CleanGenis a lightweight and effective decoding strategy that is compatible with the state-of-the-art (SOTA) LLMs. Our insight behind CleanGen is that compared to other LLMs, backdoored LLMs assign significantly higher probabilities to tokens representing the attacker-desired contents. These discrepancies in token probabilities enable CleanGen to identify suspicious tokens favored by the attacker and replace them with tokens generated by another LLM that is not compromised by the same attacker, thereby avoiding generation of attacker-desired content. We evaluate CleanGen against five SOTA backdoor attacks. Our results show that CleanGen achieves lower attack success rates (ASR) compared to five SOTA baseline defenses for all five backdoor attacks. Moreover, LLMs deploying CleanGen maintain helpfulness in their responses when serving benign user queries with minimal added computational overhead.
  3. Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Zhangchen Xu , Fengqing Jiang , Luyao Niu , Yuntian Deng , Radha Poovendran , Yejin Choi , Bill Yuchen Lin
    @unpublished{xu2024magpie,
      title = {Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
      author = {Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Deng, Yuntian and Poovendran, Radha and Choi, Yejin and Lin, Bill Yuchen},
      year = {2024},
      eprint = {2406.08464},
      archiveprefix = {arXiv},
      primaryclass = {cs.CL},
      code = {https://github.com/magpie-align/magpie},
      website = {https://magpie-align.github.io/},
      hf = {https://huggingface.co/Magpie-Align}
    }
    
    High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.

2024

  1. ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning Zhangchen Xu , Fengqing Jiang , Luyao Niu , Jinyuan Jia , Bo Li , Radha Poovendran in Proc. of Usenix Security 2024
    @inproceedings{xu2024ace,
      author = {Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Jia, Jinyuan and Li, Bo and Poovendran, Radha},
      booktitle = {Usenix Security 2024},
      title = {ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning},
      year = {2024},
      month = aug,
      volume = {},
      number = {},
      eprint = {2405.20975},
      archiveprefix = {arXiv}
    }
    
    In Federated Learning (FL), a set of clients collaboratively train a machine learning model (called global model) without sharing their local training data. The local training data of clients is typically non-i.i.d. and heterogeneous, resulting in varying contributions from individual clients to the final performance of the global model. In response, many contribution evaluation methods were proposed, where the server could evaluate the contribution made by each client and incentivize the high-contributing clients to sustain their long-term participation in FL. Existing studies mainly focus on developing new metrics or algorithms to better measure the contribution of each client. However, the security of contribution evaluation methods of FL operating in adversarial environments is largely unexplored. In this paper, we propose the first model poisoning attack on contribution evaluation methods in FL, termed ACE. Specifically, we show that any malicious client utilizing ACE could manipulate the parameters of its local model such that it is evaluated to have a high contribution by the server, even when its local training data is indeed of low quality. We perform both theoretical analysis and empirical evaluations of ACE. Theoretically, we show our design of ACE can effectively boost the malicious client’s perceived contribution when the server employs the widely-used cosine distance metric to measure contribution. Empirically, our results show ACE effectively and efficiently deceive five state-of-the-art contribution evaluation methods. In addition, ACE preserves the accuracy of the final global models on testing inputs. We also explore six countermeasures to defend ACE. Our results show they are inadequate to thwart ACE, highlighting the urgent need for new defenses to safeguard the contribution evaluation methods in FL.
  2. SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding Zhangchen Xu , Fengqing Jiang , Luyao Niu , Jinyuan Jia , Bill Yuchen Lin , Radha Poovendran in Proc. of ACL 2024 (Main) Also presented at SET LLM @ ICLR 2024
    @workshop{xu2024safedecoding,
      title = {SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding},
      author = {Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Jia, Jinyuan and Lin, Bill Yuchen and Poovendran, Radha},
      year = {2024},
      month = feb,
      eprint = {2402.08983},
      archiveprefix = {arXiv},
      primaryclass = {cs.CR},
      booktitle = {ACL 2024 (Main)},
      code = {https://github.com/uw-nsl/SafeDecoding},
      poster = {https://zhangchenxu.com/assets/pdf/safedecoding-poster.pdf},
      note = {Also presented at SET LLM @ ICLR 2024}
    }
    
    As large language models (LLMs) become increasingly integrated into real-world applications such as code generation and chatbot assistance, extensive efforts have been made to align LLM behavior with human values, including safety. Jailbreak attacks, aiming to provoke unintended and unsafe behaviors from LLMs, remain a significant/leading LLM safety threat. In this paper, we aim to defend LLMs against jailbreak attacks by introducing SafeDecoding, a safety-aware decoding strategy for LLMs to generate helpful and harmless responses to user queries. Our insight in developing SafeDecoding is based on the observation that, even though probabilities of tokens representing harmful contents outweigh those representing harmless responses, safety disclaimers still appear among the top tokens after sorting tokens by probability in descending order. This allows us to mitigate jailbreak attacks by identifying safety disclaimers and amplifying their token probabilities, while simultaneously attenuating the probabilities of token sequences that are aligned with the objectives of jailbreak attacks. We perform extensive experiments on five LLMs using six state-of-the-art jailbreak attacks and four benchmark datasets. Our results show that SafeDecoding significantly reduces the attack success rate and harmfulness of jailbreak attacks without compromising the helpfulness of responses to benign user queries. SafeDecoding outperforms six defense methods.
  3. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs Fengqing Jiang* , Zhangchen Xu* , Luyao Niu* , Zhen Xiang , Bhaskar Ramasubramanian , Bo Li , Radha Poovendran in Proc. of ACL 2024 (Main) Also presented at SET LLM @ ICLR 2024
    @workshop{jiang2024artprompt,
      title = {ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs},
      author = {Jiang*, Fengqing and Xu*, Zhangchen and Niu*, Luyao and Xiang, Zhen and Ramasubramanian, Bhaskar and Li, Bo and Poovendran, Radha},
      year = {2024},
      month = feb,
      eprint = {2402.11753},
      archiveprefix = {arXiv},
      primaryclass = {cs.CL},
      booktitle = {ACL 2024 (Main)},
      code = {https://github.com/uw-nsl/ArtPrompt},
      poster = {https://zhangchenxu.com/assets/pdf/artprompt-poster.pdf},
      note = {Also presented at SET LLM @ ICLR 2024},
      media = {[Twitter (34K+ Likes)](https://x.com/emollick/status/1763687813386547594?s=20), [INC](https://www.inc.com/kit-eaton/low-tech-computer-art-foils-cutting-edgeai-safety-systems.html), [PCGamer](https://www.pcgamer.com/hardware/researchers-have-developed-yet-another-way-to-fool-ai-chatbots-this-time-with-a-good-old-fashioned-dose-of-ascii-art/), [arstechnica](https://arstechnica.com/security/2024/03/researchers-use-ascii-art-to-elicit-harmful-responses-from-5-major-ai-chatbots/), [Tom's Hardware](https://www.tomshardware.com/tech-industry/artificial-intelligence/researchers-jailbreak-ai-chatbots-with-ascii-art-artprompt-bypasses-safety-measures-to-unlock-malicious-queries), [DailyAI](https://dailyai.com/2024/03/researchers-jailbreak-llms-by-using-ascii-art-in-prompts/)}
    }
    
    Safety is critical to the usage of large language models (LLMs). Multiple techniques such as data filtering and supervised fine-tuning have been developed to strengthen LLM safety. However, currently known techniques presume that corpora used for safety alignment of LLMs are solely interpreted by semantics. This assumption, however, does not hold in real-world applications, which leads to severe vulnerabilities in LLMs. For example, users of forums often use ASCII art, a form of text-based art, to convey image information. In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics. We show that five SOTA LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2) struggle to recognize prompts provided in the form of ASCII art. Based on this observation, we develop the jailbreak attack ArtPrompt, which leverages the poor performance of LLMs in recognizing ASCII art to bypass safety measures and elicit undesired behaviors from LLMs. ArtPrompt only requires black-box access to the victim LLMs, making it a practical attack. We evaluate ArtPrompt on five SOTA LLMs, and show that ArtPrompt can effectively and efficiently induce undesired behaviors from all five LLMs.
  4. Brave: Byzantine-Resilient and Privacy-Preserving Peer-to-Peer Federated Learning Zhangchen Xu* , Fengqing Jiang* , Luyao Niu , Jinyuan Jia , Radha Poovendran in Proc. of AsiaCCS 2024 (Poster) Also presented at PPAI Workshop @ AAAI 2024
    @inproceedings{xu2024brave,
      title = {Brave: Byzantine-Resilient and Privacy-Preserving Peer-to-Peer Federated Learning},
      author = {Xu*, Zhangchen and Jiang*, Fengqing and Niu, Luyao and Jia, Jinyuan and Poovendran, Radha},
      year = {2024},
      month = jan,
      booktitle = {AsiaCCS 2024 (Poster)},
      eprint = {2401.05562},
      archiveprefix = {arXiv},
      primaryclass = {cs.LG},
      arxivid = {2401.05562},
      note = {Also presented at PPAI Workshop @ AAAI 2024},
      poster = {https://zhangchenxu.com/assets/pdf/PPAI-brave-poster.pdf}
    }
    
    Federated learning (FL) enables multiple participants to train a global machine learning model without sharing their private training data. Peer-to-peer (P2P) FL advances existing centralized FL paradigms by eliminating the server that aggregates local models from participants and then updates the global model. However, P2P FL is vulnerable to (i) honest-but-curious participants whose objective is to infer private training data of other participants, and (ii) Byzantine participants who can transmit arbitrarily manipulated local models to corrupt the learning process. P2P FL schemes that simultaneously guarantee Byzantine resilience and preserve privacy have been less studied. In this paper, we develop Brave, a protocol that ensures Byzantine Resilience And privacy-preserving property for P2P FL in the presence of both types of adversaries. We show that Brave preserves privacy by establishing that any honest-but-curious adversary cannot infer other participants’ private data by observing their models. We further prove that Brave is Byzantine-resilient, which guarantees that all benign participants converge to an identical model that deviates from a global model trained without Byzantine adversaries by a bounded distance. We evaluate Brave against three state-of-the-art adversaries on a P2P FL for image classification tasks on benchmark datasets CIFAR10 and MNIST. Our results show that the global model learned with Brave in the presence of adversaries achieves comparable classification accuracy to a global model trained in the absence of any adversary.

2023

  1. Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications Fengqing Jiang , Zhangchen Xu , Luyao Niu , Boxin Wang , Jinyuan Jia , Bo Li , Radha Poovendran in Proc. of AsiaCCS 2024 (Poster) Also presented at Instruction Workshop @ NeurIPS 2023
    @inproceedings{jiang2023identifying,
      title = {Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications},
      author = {Jiang, Fengqing and Xu, Zhangchen and Niu, Luyao and Wang, Boxin and Jia, Jinyuan and Li, Bo and Poovendran, Radha},
      year = {2023},
      month = nov,
      booktitle = {AsiaCCS 2024 (Poster)},
      eprint = {2311.16153},
      archiveprefix = {arXiv},
      primaryclass = {cs.CR},
      note = {Also presented at Instruction Workshop @ NeurIPS 2023},
      poster = {https://zhangchenxu.com/assets/pdf/LLM-application-poster.pdf}
    }
    
    Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications such as code completion and AI-powered search. LLM-integrated applications serve as middleware to refine users’ queries with domain-specific knowledge to better inform LLMs and enhance the responses. Despite numerous opportunities and benefits, LLM-integrated applications also introduce new attack surfaces. Understanding, minimizing, and eliminating these emerging attack surfaces is a new area of research. In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle. We focus on the communication rounds that begin with user’s queries and end with LLM-integrated application returning responses to the queries, powered by LLMs at the service backend. For this query-response protocol, we identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat initiator that is able to control the database access, manipulate and poison data that are high-risk for the user. Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator. We assess such threats against LLM-integrated applications empowered by OpenAI GPT-3.5 and GPT-4. Our empirical results show that the threats can effectively bypass the restrictions and moderation policies of OpenAI, resulting in users receiving responses that contain bias, toxic content, privacy risk, and disinformation. To mitigate those threats, we identify and define four key properties, namely integrity, source identification, attack detectability, and utility preservation, that need to be satisfied by a safe LLM-integrated application. Based on these properties, we develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.
  2. Exact Fault-Tolerant Consensus with Voting Validity Zhangchen Xu , Yuetai Li , Chenglin Feng , Lei Zhang in Proc. of IPDPS 2023 (Oral)
    @inproceedings{xu2023exact,
      author = {Xu, Zhangchen and Li, Yuetai and Feng, Chenglin and Zhang, Lei},
      booktitle = {IPDPS 2023 (Oral)},
      title = {Exact Fault-Tolerant Consensus with Voting Validity},
      year = {2023},
      month = may,
      volume = {},
      number = {},
      pages = {842-852},
      slides = {https://zhangchenxu.com/assets/pdf/IPDPS Voting Validity.pdf},
      url = {https://ieeexplore.ieee.org/abstract/document/10177399},
      doi = {10.1109/IPDPS54959.2023.00089}
    }
    
    This paper investigates the multi-valued fault-tolerant distributed consensus problem that pursues exact output. To this end, the voting validity, which requires the consensus output of non-faulty nodes to be the exact plurality of the input of non-faulty nodes, is investigated. Considering a specific distribution of non-faulty votes, we first give the impossibility results and a tight lower bound of system tolerance achieving agreement, termination and voting validity. A practical consensus algorithm that satisfies voting validity in the Byzantine fault model is proposed subsequently. To ensure the exactness of outputs in any non-faulty vote distribution, we further propose safety-critical tolerance and a corresponding protocol that prioritizes voting validity over termination property. To refine the proposed protocols, we propose an incremental threshold algorithm that accelerates protocol operation speed. We also optimize consensus algorithms with the local broadcast model to enhance the protocol’s fault tolerance ability.
  3. Wireless Distributed Consensus in Vehicle to Vehicle Networks for Autonomous Driving Chenglin Feng , Zhangchen Xu , Xincheng Zhu , Paulo Valente Klaine , Lei Zhang IEEE Transactions on Vehicular Technology, 2023
    @article{feng2023wireless,
      author = {Feng, Chenglin and Xu, Zhangchen and Zhu, Xincheng and Klaine, Paulo Valente and Zhang, Lei},
      journal = {IEEE Transactions on Vehicular Technology},
      title = {Wireless Distributed Consensus in Vehicle to Vehicle Networks for Autonomous Driving},
      year = {2023},
      month = feb,
      volume = {72},
      number = {6},
      pages = {8061-8073},
      url = {https://ieeexplore.ieee.org/document/10041971},
      doi = {10.1109/TVT.2023.3243995}
    }
    
    Vital societal and industrial autonomous components are increasingly interconnected through communication networks to complete critical tasks cooperatively. However, as the reliability and trust requirements for connected autonomous systems continue to grow, the centralized communication and decision approaches that are in use today are reaching their limits. Focusing on autonomous driving applications, this paper proposes a resilient and trustworthy framework on wireless distributed consensus networks, where the communication links are less reliable or are even in the presence of incorrect local sensor readings/decisions. To accomplish that, a novel three stages consensus mechanism is proposed based on the practical Byzantine fault tolerance (PBFT), where the veto collection and gossip stages are designed to meet the stringent and complex requirements for a vehicle’s maneuvers. A plan tree synthesis is also proposed to make consensus on a series of decisions while adopting network members’ decision preferences. A detailed protocol including the distributed consensus, plan tree synthesis, dynamic grouping, etc. is proposed. Simulation results show that the proposed consensus mechanism is able to be reached and propagated through the network under poor wireless communication conditions and the presence of faulty vehicles with incorrect sensor readings. The result can be extended to other autonomous systems to significantly enhance safety in critical industrial applications.