Robust and Trustworthy Machine Learning

This is a collection of our recent publications analyzing the attackability of machine learning models, and building robust learning models against noise and attacks.

  • Chujie Gao, Qihui Zhang, Dongping Chen, Yue Huang, Siyuan Wu, Zhengyan Fu, Yao Wan, Xiangliang Zhang, Lichao Sun. The Best of Both Worlds: Toward an Honest and Helpful Large Language Model. Accepted by NeurIPS 2024 (acceptance rate 25.8%, from 15671 valid paper submissions). arXix.
  • Yujun Zhou, Yufei Han, Haomin Zhuang, Kehan Guo, Zhenwen Liang, Hongyan Bao, Xiangliang Zhang. Defending Jailbreak Prompts via In-Context Adversarial Game. Accepted by EMNLP 2024 Main. arXiv link.
  • Ziyi Kou, Shichao Pei, Meng Jiang, Xiangliang Zhang. RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models. Accepted by EMNLP 2024 Main.
  • Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bhavya Kailkhura, Caiming Xiong, Chao Zhang, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S Yu, Pin-Yu Chen, Quanquan Gu, Ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, Willian Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yue Zhao. TrustLLM: Trustworthiness in Large Language Models. Accepted by the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, July 21st – 27th, 2024. (Acceptance rate of 27.5%, 2,609 out of 9,473 submissions)
  • Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang. Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models. Accepted by The 32nd International Joint Conference on Artificial Intelligence (IJCAI-23). 19th-25th August 2023. Macao, S.A.R. (Main track, Acceptance rate = 15%)
  • Hongyan Bao,  Yufei Han,  Yujun Zhou,  Xin Gao,  Xiangliang Zhang. Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs. Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023). Feb 7-14, 2023 Washington DC. (Acceptance rate = 19.6% (1,721 of 8,777 submissions))
  • Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Bin Wang, Jiqiang Liu, Xiangliang Zhang. Poisoning with Cerberus: Stealthy and Colluded Backdoor Attack against Federated Learning. Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023). Feb 7-14, 2023 Washington DC. (Acceptance rate = 19.6% (1,721 of 8,777 submissions))
  • Hongyan Bao, Yufei Han, Yujun Zhou, Yun Shen, Xiangliang Zhang. Towards Understanding the Robustness Against Evasion Attack on Categorical Data. Accepted by ICLR 2022.
  • Helene Orsini, Hongyan Bao, Yujun Zhou, Xiangrui Xu, Yufei Han, Longyang Yi, Wei Wang, Xin Gao, and Xiangliang Zhang. AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs. The 2022 IEEE International Conference on Big Data (Big Data 2022). Regular paper. December 17-20, 2022, Osaka, Japan
  • Zhuo Yang, Yufei Han and Xiangliang Zhang. Attack Transferability Characterization for Adversarially Robust Multi-label Classification. Accepted by ECML/PKDD 2021. Virtual Conference. Aug 13-17, 2021. (Acceptance Rate = 147/685 = 21%)
  • Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Lizhen Cui and Xiangliang Zhang. Graph Embedding for Recommendation against Attribute Inference Attacks. The Web Conference 2021 (WWW’21), April, 2021.  (acceptance rate of 20.6%, 357/1736).