Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives
Liu, Haoyang, Chaudhary, Maheep, Wang, Haohan
–arXiv.org Artificial Intelligence
The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.
arXiv.org Artificial Intelligence
Jul-31-2023
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- Greenland (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Texas > Travis County
- Austin (0.04)
- New York
- New York County > New York City (0.14)
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- California
- Santa Clara County > Palo Alto (0.04)
- Los Angeles County > Long Beach (0.04)
- Washington > King County
- Canada > Quebec
- Montreal (0.04)
- Europe
- Czechia > Prague (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.13)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- France > Auvergne-Rhône-Alpes
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- China > Hong Kong (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
- Genre:
- Overview (1.00)
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Law (0.92)
- Health & Medicine
- Therapeutic Area (1.00)
- Diagnostic Medicine > Imaging (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.67)
- Vision > Face Recognition (0.67)
- Representation & Reasoning
- Uncertainty (0.92)
- Expert Systems (0.92)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Text Processing (0.67)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Reinforcement Learning (0.87)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.45)
- Information Technology > Artificial Intelligence