disagreement
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Asia > India (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.64)
- North America > United States (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
The Accidental Winners of the War on Higher Ed
Go to a small liberal-arts college if you can. I n the waning heat of last summer, freshly back in my office at a major research university, I found myself considering the higher-education hellscape that had lately descended upon the nation. I'd spent months reporting on the Trump administration's attacks on universities for, speaking with dozens of administrators, faculty, and students about the billions of dollars in cuts to public funding for research and the resulting collapse of " college life ."At Initially, I surveyed the situation from the safe distance of a journalist who happens to also be a career professor and university administrator. I saw myself as an envoy between America's college campuses and its citizens, telling the stories of the people whose lives had been shattered by these transformations. By the summer, though, that safe distance had collapsed back on me.
- North America > United States > Texas (0.05)
- North America > United States > Michigan (0.05)
- North America > United States > Massachusetts (0.05)
- (6 more...)
- Law (1.00)
- Education > Educational Setting > Higher Education (1.00)
- Government > Regional Government > North America Government > United States Government (0.90)
Learning from Synthetic Data: Limitations of ERM
Amin, Kareem, Bie, Alex, Kong, Weiwei, Syed, Umar, Vassilvitskii, Sergei
The first generation of LLMs were largely trained on human-generated data. However, the success of LLMs and their increased adoption has had an unexpected consequence of AI-generated content appearing in places where there was previously none. Thus machine learning practitioners should be aware that there is an increased chance that their training data is contaminated by LLM-generated content. Previous work has looked into the value of synthetic (i.e., AI-generated) data, and showed that while naively adding this data to the training mix may lead to model collapse, being more diligent about which data is added, the amount of curation it undergoes, and the specifics of the training process may mitigate that risk, or reverse it, leading to improved performance. These works almost uniquely focus on the LLM setting, trying to improve state of the art performance on a set of benchmarks. In contrast, in this work we take a traditional learning theory view on this problem. We begin by formalizing the setting and developing a framework that captures the invariants of having natural training data contaminated by synthetic additions. Specifically, we see three salient points: Groundtruth. There exists a (potentially small) set of natural data, coming from the true data generation distribution.
How many classifiers do we need?
As performance gains through scaling data and/or model size experience diminishing returns, it is becoming increasingly popular to turn to ensembling, where the predictions of multiple models are combined to improve accuracy. In this paper, we provide a detailed analysis of how the disagreement and the polarization (a notion we introduce and define in this paper) among classifiers relate to the performance gain achieved by aggregating individual classifiers, for majority vote strategies in classification tasks.We address these questions in the following ways.
Correlation Clustering with Adaptive Similarity Queries
In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error.
ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
Sedláček, Šimon, Barahona, Sara, Yusuf, Bolaji, Herrera-Alarcón, Laura, Kesiraju, Santosh, Bolaños, Cecilia, Lozano-Diez, Alicia, Udupa, Sathvik, López, Fernando, Ferner, Allison, Duraiswami, Ramani, Černocký, Jan
Evaluating open-ended responses from large audio language models (LALMs) is challenging because human annotators often genuinely disagree on answer correctness due to multiple valid interpretations, partial correctness, and subjective judgment. Traditional metrics reporting only mean scores fail to capture this uncertainty. We present ORCA (Open-ended Response Correctness Assessment), a framework that models the variability in human judgments using Beta distributions to predict both expected correctness and uncertainty. Our three-stage annotation framework combines human judgment with structured feedback and iterative refinement to simultaneously curate training data and improve benchmark quality. We collected 11,721 annotations across 3,580 question-answer pairs from 15 LALMs on two audio QA benchmarks, achieving inter-annotator agreement of 0.82 (Krippendorff's alpha). ORCA achieves 0.91 Spearman correlation with mean human judgments, matching or outperforming LLM-judge baselines while providing uncertainty estimates and requiring significantly less compute. We release our models, code, and curated dataset.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (11 more...)