Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives
Liu, Haoyang, Chaudhary, Maheep, Wang, Haohan
–arXiv.org Artificial Intelligence
The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.
arXiv.org Artificial Intelligence
Jul-31-2023
- Country:
- Asia (1.00)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.13)
- North America > United States
- California (0.27)
- New York > New York County
- New York City (0.14)
- Genre:
- Overview (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Industry:
- Education (1.00)
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Therapeutic Area (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (0.92)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.67)
- Machine Learning
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.45)
- Neural Networks > Deep Learning (1.00)
- Reinforcement Learning (0.87)
- Statistical Learning (1.00)
- Learning Graphical Models > Directed Networks
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Text Processing (0.67)
- Representation & Reasoning
- Expert Systems (0.92)
- Uncertainty (0.92)
- Vision > Face Recognition (0.67)
- Information Technology > Artificial Intelligence