This is a collection of our recent publications analyzing the attackability of machine learning models, and building robust learning models against noise and attacks.
- Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Jiayi Ye, Yujun Zhou, Yanbo Wang, Jiawen Shi, Qihui Zhang, Han Bao, Zhaoyi Liu, Yuan Li, Tianrui Guan, Peiran Wang, Haomin Zhuang, Dongping Chen, Kehan Guo, Andy Zou, Bryan Hooi, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang, Hongzhi Yin, Huan Zhang, Huaxiu Yao, Jieyu Zhang, Jaehong Yoon, Kai Shu, Ranjay Krishna, Swabha Swayamdipta, Weijia Shi, Xiang Li, Yuexing Hao, Zhihao Jia, Zhize Li, Xiuying Chen, Zhengzhong Tu, Xiyang Hu, Tianyi Zhou, Jieyu Zhao, Lichao Sun, Furong Huang, Or Cohen-Sasson, Prasanna Sattigeri, Anka Reuel, Max Lamparth, Yue Zhao, Nouha Dziri, Yu Su, Huan Sun, Heng Ji, Chaowei Xiao, Mohit Bansal, Nitesh V Chawla, Jian Pei, Jianfeng Gao, Michael Backes, Philip S. Yu, Neil Zhenqiang Gong, Pin-Yu Chen, Bo Li, Dawn Song, Xiangliang Zhang. TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models. Accepted by ICLR 2026. (Acceptance rate ~ 28%, out of 19000 submissions) Dataset available.
- Yue Huang ~Yue_Huang9 , Hang Hua, Yujun Zhou, Pengcheng Jing, Manish Nagireddy, Inkit Padhi, Greta Dolcetti, Zhangchen Xu, Subhajit Chaudhury, Ambrish Rawat, Liubov Nedoshivina, Pin-Yu Chen, Prasanna Sattigeri, Xiangliang Zhang. Building a Foundational Guardrail for General Agentic Systems via Synthetic Data. Accepted by ICLR 2026. (Acceptance rate ~ 28%, out of 19000 submissions)
- Yujun Zhou, Jingdong Yang, Yue Huang, Kehan Guo, Zoe Emory, Bikram Ghosh, Amita Bedar, Sujay Shekar, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V. Chawla, and Xiangliang Zhang. Benchmarking Large Language Models on Safety Issues in Scientific Labs. Nature Machine Intelligence, Jan 2026. https://doi.org/10.1038/s42256-025-01152-1 Highlighted by New Scientist magazine, and Science News.
- Yue Huang, Zhengqing Yuan, Yujun Zhou, Kehan Guo, Xiangqi Wang, Haomin Zhuang, Weixiang Sun, Lichao Sun, Jindong Wang, Yanfang Ye, Xiangliang Zhang. Exposing and Patching the Flaws of Large Language Models in Social Character Simulation. Accepted by COLM 2025. (acceptance rate 32%, 418 out of 1,305 submissions)
- Yanbo Wang, Jiayi Ye, Siyuan Wu, Chujie Gao, Yue Huang, Xiuying Chen, Yue Zhao, Xiangliang Zhang. TrustEval: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models. NAACL 2025 (System Demonstrations).
- Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang. Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge. Accepted by ICLR 2025.
- Chujie Gao, Qihui Zhang, Dongping Chen, Yue Huang, Siyuan Wu, Zhengyan Fu, Yao Wan, Xiangliang Zhang, Lichao Sun. HonestLLM: Toward an Honest and Helpful Large Language Model. Accepted by NeurIPS 2024 (acceptance rate 25.8%, from 15671 valid paper submissions). arXix.
- Yujun Zhou, Yufei Han, Haomin Zhuang, Kehan Guo, Zhenwen Liang, Hongyan Bao, Xiangliang Zhang. Defending Jailbreak Prompts via In-Context Adversarial Game. Accepted by EMNLP 2024 Main. arXiv link.
- Ziyi Kou, Shichao Pei, Meng Jiang, Xiangliang Zhang. RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models. Accepted by EMNLP 2024 Main.
- Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Yongsheng Zhu, Guangquan Xu, Jiqiang Liu, Xiangliang Zhang. Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning. Accepted at USENIX Security 2024.
- Ziyi Kou, Yijun Tian, Meng Jiang and Xiangliang Zhang. FaDE: A Face Segment Driven Identity Anonymization Framework For Fair Face Recognition. Accepted at the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024) for the Full Research Paper track. (acceptance rate 347/1496 = 23%)
- Yujun Zhou, Yufei Han, Haomin Zhuang, Hongyan Bao, Xiangliang Zhang. Attack-free Evaluating and Enhancing Adversarial Robustness on Categorical Data. Accepted by the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, July 21st – 27th, 2024. (Acceptance rate of 27.5%, 2,609 out of 9,473 submissions)
- Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bhavya Kailkhura, Caiming Xiong, Chao Zhang, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S Yu, Pin-Yu Chen, Quanquan Gu, Ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, Willian Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yue Zhao. TrustLLM: Trustworthiness in Large Language Models. Accepted by the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, July 21st – 27th, 2024. (Acceptance rate of 27.5%, 2,609 out of 9,473 submissions)
- Xiaoting Lyu, Yufei Han, Wei Wang, Hangwei Qian, Ivor Tsang, Xiangliang Zhang. Cross-Context Backdoor Attacks against Graph Prompt Learning. Accepted by the 30th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024). Barcelona, Spain, Sunday 25 August 2024 – Thursday 29 August 2024.
- Manal Alshehri and Xiangliang Zhang. Forgetting User Preference in Recommendation Systems with Label-Flipping. The 2023 IEEE International Conference on Big Data, Dec 15-18, 2023 @ Sorrento, Italy (Regular Paper, 92 out of 526 submissions = 17%)
- Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang. Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models. Accepted by The 32nd International Joint Conference on Artificial Intelligence (IJCAI-23). 19th-25th August 2023. Macao, S.A.R. (Main track, Acceptance rate = 15%)
- Hongyan Bao, Yufei Han, Yujun Zhou, Xin Gao, Xiangliang Zhang. Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs. Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023). Feb 7-14, 2023 Washington DC. (Acceptance rate = 19.6% (1,721 of 8,777 submissions))
- Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Bin Wang, Jiqiang Liu, Xiangliang Zhang. Poisoning with Cerberus: Stealthy and Colluded Backdoor Attack against Federated Learning. Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023). Feb 7-14, 2023 Washington DC. (Acceptance rate = 19.6% (1,721 of 8,777 submissions))
- Hongyan Bao, Yufei Han, Yujun Zhou, Yun Shen, Xiangliang Zhang. Towards Understanding the Robustness Against Evasion Attack on Categorical Data. Accepted by ICLR 2022.
- Helene Orsini, Hongyan Bao, Yujun Zhou, Xiangrui Xu, Yufei Han, Longyang Yi, Wei Wang, Xin Gao, and Xiangliang Zhang. AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs. The 2022 IEEE International Conference on Big Data (Big Data 2022). Regular paper. December 17-20, 2022, Osaka, Japan
- Zhuo Yang, Yufei Han and Xiangliang Zhang. Attack Transferability Characterization for Adversarially Robust Multi-label Classification. Accepted by ECML/PKDD 2021. Virtual Conference. Aug 13-17, 2021. (Acceptance Rate = 147/685 = 21%)
- Zhuo Yang, Yufei Han, and Xiangliang Zhang. Characterizing the Evasion Attackability of Multi-label Classifiers. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021) (acceptance rate of 21%, 1692/7911)
- Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Lizhen Cui and Xiangliang Zhang. Graph Embedding for Recommendation against Attribute Inference Attacks. The Web Conference 2021 (WWW’21), April, 2021. (acceptance rate of 20.6%, 357/1736).
- Shichao Pei, Lu Yu, Guoxian Yu, and Xiangliang Zhang. REA: Robust Cross-lingual Entity Alignment Between Knowledge Graphs. The 26th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020): 2175-2184, August 22 – 27, 2020, San Diego, CA, USA. (Acceptance rate 216/1279=16.9%).
- Yutong Wang, Yufei Han, Hongyan Bao, Yun Shen, Fenglong Ma, Jin Li and Xiangliang Zhang. Attackability Characterization of Adversarial Evasion Attack on Discrete Data. The 26th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020):1415-1425, August 22 – 27, 2020, San Diego, CA, USA. (Acceptance rate 216/1279=16.9%).
- Yufei Han, Xiangliang Zhang. Robust Federated Learning via Collaborative Machine Teaching. In the Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020). Feb 7-12, 2020, New York. (acceptance rate = 1591/7737 = 20.6%) (paper at arXiv)