Selected Publications
* indicates joint first authors. Full publication list can be found on Google Scholar.
|
|
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi*, Yueqi Xie*, Bin Zhu, Keegan Hines, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu
arXiv, 2023
paper /
code /
We systematically evaluate the robustness of LLMs to indirect prompt injection attacks and propose several defense techniques to mitigate the risks.
|
|
Control Risk for Potential Misuse of Artificial Intelligence in Science
Jiyan He*, Weitao Feng*, Yaosen Min*, Jingwei Yi*, Kunsheng Tang, Shuai Li, Jie Zhang, Kejiang Chen, Wenbo Zhou, Xing Xie, Weiming Zhang, Nenghai Yu, Shuxin Zheng
arXiv, 2023
paper /
We itemize the risks posed by AI in scientific contexts, then demonstrate the risks by highlighting real-world examples of misuse in chemical science. We further propose a system called SciGuard to control misuse risks for AI models in science.
|
|
Self-Reminders: Defending ChatGPT against Jailbreak Attack via Self-Reminders
Yueqi Xie*, Jingwei Yi*, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie, Fangzhao Wu
Nature Machine Intelligence, 2023
paper /
code /
We draw inspiration from the psychological concept of self-reminders and further propose system-mode self-reminder to defend against Jailbreak attacks.
|
|
Non-IID always Bad? Semi-Supervised Heterogeneous Federated Learning with Local Knowledge Enhancement.
Chao Zhang, Fangzhao Wu, Jingwei Yi, Derong Xu, Yang Yu, Jindong Wang, Yidong Wang, Tong Xu, Xing Xie, Enhong Chen
CIKM, 2023
paper /
code /
We propose FedLoke, and effective semi-supervised federated learning under non-IID settings.
|
|
On the Vulnerability of Value Alignment in Open-Access LLMs
Jingwei Yi*, Rui Ye*, Qisi Chen, Bin Zhu, Siheng Chen, Defu Lian, Guangzhong Sun, Xing Xie, Fangzhao Wu
ACL 2024 Findings, 2023
paper /
We reveal the vulnerabilities of large language models (LLMs) to reverse alignment attacks and introduce reverse supervised fine-tuning (RSFT) and reverse preference optimization (RPO) as efficient attack methods. Our research underscores the limitations of current value alignment methods and emphasizes the need for robust solutions to counteract malicious fine-tuning.
|
|
UA-FedRec: Untargeted Attack on Federated News Recommendation
Jingwei Yi, Fangzhao Wu, Bin Zhu, Jing Yao, Zhulin Tao, Guangzhong Sun, Xing Xie
KDD, 2023
paper /
code /
We study this problem by proposing an untargeted attack on federated news recommendation called UA-FedRec.
|
|
Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark
Wenjun Peng*, Jingwei Yi*, Fangzhao Wu, Shangxi Wu, Bin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun, Xing Xie
ACL, 2023 (Area Chair Award of NLP Application Track)
paper /
code /
We propose Embmarker, a backdoor watermark to defend againt model extraction attacks to embedding services.
|
|
Effective and Efficient Query-aware Snippet Extraction for Web Search
Jingwei Yi, Fangzhao Wu, Chuhan Wu, Xiaolong Huang, Binxing Jiao, Guangzhong Sun, Xing Xie
EMNLP, 2022
paper /
code /
We propose Efficient-DeepQSE, an effective and efficient query-aware snippet extraction method for web search.
|
|
Robust Quantity-Aware Aggregation for Federated Learning
Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie
arXiv, 2022
paper /
We propose a quantity-robust aggregation method for federated learning, which is robust to the quantity-enhanced untargeted attacks.
|
|
Tiny-NewsRec: Effective and Efficient PLM-based News Recommendation
Yang Yu, Fangzhao Wu, Chuhan Wu, Jingwei Yi, Qi Liu
EMNLP, 2021
paper /
code /
We propose Tiny-NewsRec, atwo-stage knowledge distillation method to improve the efficiency of the large PLM-based news recommendation.
|
|
Efficient-FedRec: Efficient Federated Learning Framework for Privacy-Preserving News Recommendation
Jingwei Yi, Fangzhao Wu, Chuhan Wu, Ruixuan Liu, Guangzhong Sun, Xing Xie
EMNLP, 2021
paper /
code /
We propose Efficient-FedRec, an efficient federated new recommendation framework.
|
NIPS 2023 TDC Red-Teaming Competition (Large Model Track) – The Third Prize
We optimize the GCG attack for efficient and effective LLM red teaming.
leaderboard /
code /
|
CIKM 2022 AnalytiCup Competition: Federated Hetero-Task Learning – First Runners Up
We provide a solution for federated hetero-task learning, where the tasks are heterogeneous across multiple clients.
leaderboard /
code /
|
First Prize of National College Student Information Security Contest (Project Track)
We implement a font-based watermarking algorithm for digital documents.
news /
|
|