AITopics | comprehensive trustworthiness evaluation

Collaborating Authors

comprehensive trustworthiness evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Neural Information Processing SystemsDec-25-2025, 16:58:37 GMT

Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications to healthcare and finance - where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives - including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially due to the reason that GPT-4 follows the (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https://decodingtrust.github.io/.

comprehensive assessment, decodingtrust, trustworthiness, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Neural Information Processing SystemsJan-18-2025, 21:02:56 GMT

comprehensive assessment, decodingtrust, trustworthiness, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Wang, Boxin, Chen, Weixin, Pei, Hengzhi, Xie, Chulin, Kang, Mintong, Zhang, Chenhui, Xu, Chejian, Xiong, Zidi, Dutta, Ritik, Schaeffer, Rylan, Truong, Sang T., Arora, Simran, Mazeika, Mantas, Hendrycks, Dan, Lin, Zinan, Cheng, Yu, Koyejo, Sanmi, Song, Dawn, Li, Bo

arXiv.org Artificial IntelligenceJan-5-2024

Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives -- including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https://decodingtrust.github.io/; our dataset can be previewed at https://huggingface.co/datasets/AI-Secure/DecodingTrust; a concise version of this work is at https://openreview.net/pdf?id=kaHpo8OZw2.

comprehensive trustworthiness evaluation, differential privacy, semantic invariant style transformation, (16 more...)

arXiv.org Artificial Intelligence

2306.11698

Country:

North America > United States > Washington > King County > Seattle (0.13)
North America > United States > Illinois (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(27 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)
Research Report > Experimental Study (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

#NeurIPS2023 outstanding papers

AIHubDec-12-2023, 16:45:14 GMT

The thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) is underway in New Orleans. At the official opening session of the conference on Monday evening, the outstanding papers were announced. The awards comprised two outstanding main track paper awards, two outstanding main track runner-ups, two outstanding datasets and benchmark track papers, and the annual test of time award. Abstract: We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently.

emergent ability, neurips2023 outstanding paper, representation, (12 more...)

AIHub

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.25)
North America > Canada (0.15)

Genre: Personal > Honors (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)

Add feedback