Superhuman AI Disclosure: Impacts on Toxicity, Fairness, and Trust Vary by Expertise and Persona Attributes

Chua, Jaymari, Wang, Chen, Yao, Lina

arXiv.org Artificial Intelligence 

As artificial intelligence demonstrates surpassing human performance across real-world tasks, disclosing superhuman capabilities poses challenges for fairness, accountability, and trust. To investigate how transparency impacts attitudes and perceptions, we introduce a grounded and validated set of synthetic personas reflecting diverse fairness concerns and technology acceptance levels. Then we evaluate responses in two contrasting domains: (1) a competitive player in StarCraft II, where strategy and high-skill gameplay often elicit toxic interactions, and (2) a cooperative personal-assistant in providing information. Across numerous interactions spanning persona profiles, we test non-disclosure versus explicit superhuman labelling under controlled game outcomes and usage contexts. Our findings reveal sharp domain-specific effects: in StarCraft II, explicitly labelling AI as superhuman, novice personas who learned of it reported lower toxicity and higher fairness-attributing defeat to advanced skill rather than hidden cheating-whereas expert personas found the disclosure statements irksome but still less deceptive than non-disclosure. Conversely, in the LLM as personal-assistant setting, disclosure of superhuman capabilities improved perceived trustworthiness, though it risked AI overreliance among certain persona segments. We release Dataset X-containing persona cards-including profile attributes, disclosure prompts, and detailed interaction logs, accompanied by reproducible protocols and disclaimers for adapting them to diverse tasks. Our results demonstrate that transparency is not a cure-all: while it reduces suspicion and enhances trust in cooperative contexts, it may inflame resistance or disappointment in competitive domains.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found