Goto

Collaborating Authors

 aptitude


On Explaining Proxy Discrimination and Unfairness in Individual Decisions Made by AI Systems

Sonna, Belona, Grastien, Alban

arXiv.org Artificial Intelligence

Artificial intelligence (AI) systems in high-stakes domains raise concerns about proxy discrimination, unfairness, and explainability. Existing audits often fail to reveal why unfairness arises, particularly when rooted in structural bias. We propose a novel framework using formal abductive explanations to explain proxy discrimination in individual AI decisions. Leveraging background knowledge, our method identifies which features act as unjustified proxies for protected attributes, revealing hidden structural biases. Central to our approach is the concept of aptitude, a task-relevant property independent of group membership, with a mapping function aligning individuals of equivalent aptitude across groups to assess fairness substantively. As a proof of concept, we showcase the framework with examples taken from the German credit dataset, demonstrating its applicability in real-world cases.


ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

Khalid, Haziq Mohammad, Jeyaganthan, Athikash, Do, Timothy, Fu, Yicheng, O'Brien, Sean, Sharma, Vasu, Zhu, Kevin

arXiv.org Artificial Intelligence

Large Language Models (LLMs) suffer significant performance degradation in multi-turn conversations when information is presented incrementally. Given that multi-turn conversations characterize everyday interactions with LLMs, this degradation poses a severe challenge to real world usability. We hypothesize that abrupt increases in model uncertainty signal misalignment in multi-turn LLM interactions, and we exploit this insight to dynamically realign conversational context. We introduce ERGO (Entropy-guided Resetting for Generation Optimization), which continuously quantifies internal uncertainty via Shannon entropy over next token distributions and triggers adaptive prompt consolidation when a sharp spike in entropy is detected. By treating uncertainty as a first class signal rather than a nuisance to eliminate, ERGO embraces variability in language and modeling, representing and responding to uncertainty. In multi-turn tasks with incrementally revealed instructions, ERGO yields a 56.6% average performance gain over standard baselines, increases aptitude (peak performance capability) by 24.7%, and decreases unreliability (variability in performance) by 35.3%, demonstrating that uncertainty aware interventions can improve both accuracy and reliability in conversational AI.


LLMs Get Lost In Multi-Turn Conversation

Laban, Philippe, Hayashi, Hiroaki, Zhou, Yingbo, Neville, Jennifer

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions, LLM evaluation has predominantly focused on the single-turn, fully-specified instruction setting. In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. Our experiments confirm that all the top open- and closed-weight LLMs we test exhibit significantly lower performance in multi-turn conversations than single-turn, with an average drop of 39% across six generation tasks. Analysis of 200,000+ simulated conversations decomposes the performance degradation into two components: a minor loss in aptitude and a significant increase in unreliability. We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that *when LLMs take a wrong turn in a conversation, they get lost and do not recover*.


Towards Responsible AI in Education: Hybrid Recommendation System for K-12 Students Case Study

Drushchak, Nazarii, Tyshchenko, Vladyslava, Polyakovska, Nataliya

arXiv.org Artificial Intelligence

--The growth of Educational T echnology (EdT ech) has enabled highly personalized learning experiences through Artificial Intelligence (AI)-based recommendation systems tailored to each student's needs. However, these systems can unintentionally introduce biases, potentially limiting fair access to learning resources. This study presents a recommendation system for K-12 students, combining graph-based modeling and matrix factorization to provide personalized suggestions for extracurricular activities, learning resources, and volunteering opportunities. T o address fairness concerns, the system includes a framework to detect and reduce biases by analyzing feedback across protected student groups. This work highlights the need for continuous monitoring in educational recommendation systems to support equitable, transparent, and effective learning opportunities for all students. I NTRODUCTION The rapid advancement of Educational Technology (EdTech) has significantly reshaped traditional learning environments, enabling the delivery of personalized educational experiences tailored to individual students' needs. According to the U.S. Department of Education Office of Educational Technology, leveraging AI-based modern educational technologies has been pivotal in providing personalized pathways for learning, supporting adaptive and individualized instruction, and enhancing student engagement through innovative digital solutions 1 . This trend toward personalization in education underscores the importance of leveraging advanced recommendation systems to support student exploration and growth.


Nexus: A Brief History of Information Networks from the Stone Age to AI by Yuval Noah Harari review – rage against the machine

The Guardian

What jumps to mind when you think about the impending AI apocalypse? If you're partial to sci-fi movie cliches, you may envisage killer robots (with or without thick Austrian accents) rising up to terminate their hubristic creators. Or perhaps, a la The Matrix, you'll go for scary machines sucking energy out of our bodies as they distract us with a simulated reality. For Yuval Noah Harari, who has spent a lot of time worrying about AI over the past decade, the threat is less fantastical and more insidious. "In order to manipulate humans, there is no need to physically hook brains to computers," he writes in his engrossing new book Nexus.


The limitations of scaling up AI language models

#artificialintelligence

But the dominant approach to developing these models involves leveraging massive computational resources, which has consequences. Beyond the fact that training and deploying large language models can incur high technical costs, the requirements put the models beyond the reach of many organizations and institutions. Scaling also doesn't resolve the major problem of model bias and toxicity, which often creeps in from the data used to train the models. In a panel during the Conference on Neural Information Processing Systems (NeurIPS) 2021, experts from the field discussed how the research community should adapt as progress in language models continues to be driven by scaled-up algorithms. The panelists explored how to ensure that smaller institutions and can meaningfully research and audit large-scale systems, as well as ways that they can help to ensure that the systems behave as intended.


No-code Platforms Set To Accelerate Data, AI Adoption

#artificialintelligence

Talk of low-code, no-code platforms have been making the rounds of late, with Goldman Sacks injecting USD90 million into low-code software maker WSO2, while data automation platform Cascade Labs last week raised USD5.3 million. And as observed in a Washington Post report this month, the rapid rise of low-code has allowed non-computer scientists to create digital applications that were previously the domain of computer science graduates, while simultaneously opening the door to deliver fast and meaningful impact to organizations. But what exactly is a low-code, and what implications does it have on building a vibrant data culture or developing data-centric and AI applications? At its heart, low-code is essentially a development environment for creating application software by leveraging scripting and a graphical user interface (GUI). The ability to visually configure applications significantly speeds development over traditional programming languages such as C or Python.


How to Not Lose Your Job to AI

#artificialintelligence

The question is now: are we becoming irrelevant? Initially, that's what the future might look like when witnessing OpenAI's new platform, Cortex. Cortex essentially allows you to ask it, in "human speak," to code things for you. Take this demo clip for instance. In the full video, you can see Cortex put together a simple website from a couple of sentences.


Best practices to build data literacy into your Gen Z workforce - Data Dreamer

#artificialintelligence

This is a guest post by Kirk Borne, Ph.D., Chief Science Officer at DataPrime.ai, Kirk is also a consultant, astrophysicist, data scientist, blogger, data literacy advocate and renowned speaker, and is one of the most recognized names in the industry. A survey of 1,100 data practitioners and business leaders reported that 84% of organizations consider data literacy to be a core business skill, agreeing with the statement that the inability of the workforce to use and analyze data effectively can hamper their business success. In addition, 36% said data literacy is crucial to future-proofing their business. Another survey found that 75% of employees are not comfortable using data.


Decoding machine learning benchmarks

Cardoso, Lucas F. F., Santos, Vitor C. A., Francês, Regiane S. K., Prudêncio, Ricardo B. C., Alves, Ronnie C. O.

arXiv.org Machine Learning

Despite the availability of benchmark machine learning (ML) repositories (e.g., UCI, OpenML), there is no standard evaluation strategy yet capable of pointing out which is the best set of datasets to serve as gold standard to test different ML algorithms. In recent studies, Item Response Theory (IRT) has emerged as a new approach to elucidate what should be a good ML benchmark. This work applied IRT to explore the well-known OpenML-CC18 benchmark to identify how suitable it is on the evaluation of classifiers. Several classifiers ranging from classical to ensembles ones were evaluated using IRT models, which could simultaneously estimate dataset difficulty and classifiers' ability. The Glicko-2 rating system was applied on the top of IRT to summarize the innate ability and aptitude of classifiers. It was observed that not all datasets from OpenML-CC18 are really useful to evaluate classifiers. Most datasets evaluated in this work (84%) contain easy instances in general (e.g., around 10% of difficult instances only). Also, 80% of the instances in half of this benchmark are very discriminating ones, which can be of great use for pairwise algorithm comparison, but not useful to push classifiers abilities. This paper presents this new evaluation methodology based on IRT as well as the tool decodIRT, developed to guide IRT estimation over ML benchmarks.