AITopics | representativeness

Collaborating Authors

representativeness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Offline Active Learning on Graphs

Neural Information Processing SystemsMar-21-2026, 00:59:37 GMT

We consider the problem of active learning on graphs for node-level tasks, which has crucial applications in many real-world networks where labeling node responses is expensive. In this paper, we propose an offline active learning method that selects nodes to query by explicitly incorporating information from both the network structure and node covariates. Building on graph signal recovery theories and the random spectral sparsification technique, the proposed method adopts a two-stage biased sampling strategy that takes both informativeness and representativeness into consideration for node querying. Informativeness refers to the complexity of graph signals that are learnable from the responses of queried nodes, while representativeness refers to the capacity of queried nodes to control generalization errors given noisy node-level information. We establish a theoretical relationship between generalization error and the number of nodes selected by the proposed method. Our theoretical results demonstrate the trade-off between Informativeness and representativeness in active learning. Extensive numerical experiments show that the proposed method is competitive with existing graph-based active learning methods, especially when node covariates and responses contain noises. Additionally, the proposed method is applicable to both regression and classification tasks on graphs.

artificial intelligence, machine learning, proceedings robust offline active learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interactive Deep Clustering via Value Mining

Neural Information Processing SystemsMar-20-2026, 08:11:28 GMT

In the absence of class priors, recent deep clustering methods resort to data augmentation and pseudo-labeling strategies to generate supervision signals. Though achieved remarkable success, existing works struggle to discriminate hard samples at cluster boundaries, mining which is particularly challenging due to their unreliable cluster assignments. To break such a performance bottleneck, we propose incorporating user interaction to facilitate clustering instead of exhaustively mining semantics from the data itself. To be exact, we present Interactive Deep Clustering (IDC), a plug-and-play method designed to boost the performance of pre-trained clustering models with minimal interaction overhead. More specifically, IDC first quantitatively evaluates sample values based on hardness, representativeness, and diversity, where the representativeness avoids selecting outliers and the diversity prevents the selected samples from collapsing into a small number of clusters. IDC then queries the cluster affiliations of high-value samples in a user-friendly manner.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

6c5f877b2d78e093860ce9715e251dec-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 15:56:49 GMT

artificial intelligence, machine learning, node, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New Jersey (0.04)

Genre:

Instructional Material (0.67)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Education (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Interactive Deep Clustering via Value Mining

Neural Information Processing SystemsFeb-12-2026, 14:07:09 GMT

In the absence of class priors, recent deep clustering methods resort to data augmentation and pseudo-labeling strategies to generate supervision signals.

artificial intelligence, machine learning, proceedings, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Not All Out-of-Distribution Data Are Harmful to Open-Set Active Learning Y ang

Neural Information Processing SystemsFeb-9-2026, 12:46:47 GMT

Recall (higher is better): the ratio of selected ID instances to the total number of ID instances.

artificial intelligence, machine learning, proportion, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Nevada (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.30)

Add feedback

Not All Out-of-Distribution Data Are Harmful to Open-Set Active Learning Y ang

Neural Information Processing SystemsFeb-9-2026, 12:46:44 GMT

Recall (higher is better): the ratio of selected ID instances to the total number of ID instances.

artificial intelligence, machine learning, proportion, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Nevada (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.30)

Add feedback

Can AI Truly Represent Your Voice in Deliberations? A Comprehensive Study of Large-Scale Opinion Aggregation with LLMs

Zhu, Shenzhe, Yang, Shu, Bakker, Michiel A., Pentland, Alex, Pei, Jiaxin

arXiv.org Artificial IntelligenceDec-10-2025

Large-scale public deliberations generate thousands of free-form contributions that must be synthesized into representative and neutral summaries for policy use. While LLMs have been shown as a promising tool to generate summaries for large-scale deliberations, they also risk underrepresenting minority perspectives and exhibiting bias with respect to the input order, raising fairness concerns in high-stakes contexts. Studying and fixing these issues requires a comprehensive evaluation at a large scale, yet current practice often relies on LLMs as judges, which show weak alignment with human judgments. To address this, we present DeliberationBank, a large-scale human-grounded dataset with (1) opinion data spanning ten deliberation questions created by 3,000 participants and (2) summary judgment data annotated by 4,500 participants across four dimensions (representativeness, informativeness, neutrality, policy approval). Using these datasets, we train DeliberationJudge, a fine-tuned DeBERTa model that can rate deliberation summaries from individual perspectives. DeliberationJudge is more efficient and more aligned with human judgements compared to a wide range of LLM judges. With DeliberationJudge, we evaluate 18 LLMs and reveal persistent weaknesses in deliberation summarization, especially underrepresentation of minority positions. Our framework provides a scalable and reliable way to evaluate deliberation summarization, helping ensure AI systems are more representative and equitable for policymaking.

deliberation, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.05154

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning

Huseljic, Denis, Herde, Marek, Rauch, Lukas, Hahn, Paul, Sick, Bernhard

arXiv.org Artificial IntelligenceDec-1-2025

Existing active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can vary substantially across datasets, models, and even AL cycles. Committing to a single strategy risks suboptimal performance, as no single strategy dominates throughout the entire AL process. We introduce REFINE, an ensemble AL method that combines multiple strategies without knowing in advance which will perform best. In each AL cycle, REFINE operates in two stages: (1) Progressive filtering iteratively refines the unlabeled pool by considering an ensemble of AL strategies, retaining promising candidates capturing different notions of value. (2) Coverage-based selection then chooses a final batch from this refined pool, ensuring all previously identified notions of value are accounted for. Extensive experiments across 6 classification datasets and 3 foundation models show that REFINE consistently outperforms individual strategies and existing ensemble methods. Notably, progressive filtering serves as a powerful preprocessing step that improves the performance of any individual AL strategy applied to the refined pool, which we demonstrate on an audio spectrogram classification use case. Finally, the ensemble of REFINE can be easily extended with upcoming state-of-the-art AL strategies.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Artificial Intelligence

2511.22344

Country: North America (0.28)

Genre:

Research Report (0.82)
Workflow (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma

Sahoo, Subramanyam, Chadha, Aman, Jain, Vinija, Chaudhary, Divya

arXiv.org Machine LearningNov-26-2025

Reinforcement Learning from Human Feedback (RLHF) is widely used for aligning large language models, yet practitioners face a persistent puzzle: improving safety often reduces fairness, scaling to diverse populations becomes computationally intractable, and making systems robust often amplifies majority biases. We formalize this tension as the Alignment Trilemma: no RLHF system can simultaneously achieve (i) epsilon-representativeness across diverse human values, (ii) polynomial tractability in sample and compute complexity, and (iii) delta-robustness against adversarial perturbations and distribution shift. Through a complexity-theoretic analysis integrating statistical learning theory and robust optimization, we prove that achieving both representativeness (epsilon <= 0.01) and robustness (delta <= 0.001) for global-scale populations requires Omega(2^{d_context}) operations, which is super-polynomial in the context dimensionality. We show that current RLHF implementations resolve this trilemma by sacrificing representativeness: they collect only 10^3--10^4 samples from homogeneous annotator pools while 10^7--10^8 samples are needed for true global representation. Our framework provides a unified explanation for documented RLHF pathologies including preference collapse, sycophancy, and systematic bias amplification. We conclude with concrete directions for navigating these fundamental trade-offs through strategic relaxations of alignment requirements.

guideline, justification, robustness, (15 more...)

arXiv.org Machine Learning

2511.19504

Country: