AITopics | Europe

Collaborating Authors

Europe

Class conditional conformal prediction for multiple inputs by p-value aggregation

Neural Information Processing SystemsJun-15-2026, 08:12:27 GMT

Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.

artificial intelligence, machine learning, prediction, (20 more...)

Neural Information Processing Systems

Country: Europe > France (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Tightening Regret Lower and Upper Bounds in Restless Rising Bandits

Neural Information Processing SystemsJun-15-2026, 08:12:12 GMT

Restless Multi-Armed Bandits (MABs) are a general framework designed to handle real-world decision-making problems where the expected rewards evolve over time, such as in recommender systems and dynamic pricing. In this work, we investigate from a theoretical standpoint two well-known structured subclasses of restless MABs: the rising and the rising concave settings, where the expected reward of each arm evolves over time following an unknown non-decreasing and a non-decreasing concave function, respectively. By providing a novel methodology of independent interest for general restless bandits, we establish new lower bounds on the expected cumulative regret for both settings. In the rising case, we prove a lower bound of order ΩpT2{3q, matching known upper bounds for restless bandits; whereas, in the rising concave case, we derive a lower bound of order ΩpT3{5q, proving for the first time that this setting is provably more challenging than stationary MABs. Then, we introduce Rising Concave Budgeted Exploration (RC-BEpαq), a new regret minimization algorithm designed for the rising concave MABs. By devising a novel proof technique, we show that the expected cumulative regret of RC-BEpαq is in the order of rOpT7{11q. These results collectively make a step towards closing the gap in rising concave MABs, positioning them between stationary and general restless bandit settings in terms of statistical complexity.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering

Neural Information Processing SystemsJun-15-2026, 07:57:37 GMT

Current document retrieval-augmented generation (DocRAG) Therefore, the number of female respondents who never listened to theradio is: Number of females who never listened = 2,001 * 0.557 = 1,115 methods remain limited by their text-centric approaches, frequently missing "text12": [ "The table provides a

large language model, machine learning, question answering, (22 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia > Middle East > UAE (0.45)
North America > United States > Minnesota (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (1.00)
Banking & Finance (1.00)
Media > Music (0.92)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models

Neural Information Processing SystemsJun-15-2026, 07:57:17 GMT

Numerous well-established studies have demonstrated the superhuman capabilities of modern Vision-Language Models (VLMs) across a wide range of tasks. However, growing is the doubt about the continuing availability of reliable high-quality labeling (supervision) from human annotators, leading to stagnation of the model's performance. To address this challenge, "superalignment" employs the so-called weak-to-strong generalization paradigm, where the supervision from a weak model can provide generalizable knowledge for a strong model. While effective in aligning knowledge for clean samples between the strong and weak models, the standard weak-to-strong approach typically fails to capture adversarial robustness, exposing strong VLMs to adversarial attacks. This inability to transfer adversarial robustness is because adversarial samples are normally missing in the superalignment stage. To this end, we are the first to propose the weak-to-strong (adversarial) robustness generalization method to elicit zero-shot robustness in large-scale models by an unsupervised scheme, mitigating the unreliable information source for alignment from two perspectives: alignment re-weighting and source guidance refinement. We analyze settings under which robustness generalization is possible.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Europe (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

Neural Information Processing SystemsJun-15-2026, 07:10:17 GMT

Multimodal Large Language Models (MLLMs) have made rapid progress in perception, understanding, and reasoning, yet existing benchmarks fall short in evaluating these abilities under continuous and dynamic real-world video streams. Such settings require models to maintain coherent understanding and reasoning as visual scenes evolve over time. We introduce RTV-Bench, a fine-grained benchmark for real-time video analysis with MLLMs. It is built upon three key principles: multi-timestamp question answering, hierarchical question structures spanning perception and reasoning, and multi-dimensional evaluation of continuous perception, understanding, and reasoning. RTV-Bench comprises 552 diverse videos and 4,608 carefully curated QA pairs covering a wide range of dynamic scenarios. We evaluate a broad range of state-of-the-art MLLMs, including proprietary, open-source offline, and open-source real-time models. Our results show that realtime models generally outperform offline counterparts but still lag behind leading proprietary systems. While scaling model capacity generally yields performance gains, simply increasing the density of sampled input frames does not consistently translate into improved results. These observations suggest inherent limitations in current architectures when handling long-horizon video streams, underscoring the need for models explicitly designed for streaming video processing and analysis.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
Asia (0.68)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models

Neural Information Processing SystemsJun-15-2026, 06:57:36 GMT

Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Neural Information Processing SystemsJun-15-2026, 06:55:58 GMT

Worldwide image geolocalization--the task of predicting GPS coordinates from images taken anywhere on Earth--poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a twostage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods. We also release our code, checkpoint, and dataset online2 for ease of reproduction.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > New York (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Add feedback

Norwegian crown princess's son found guilty of two counts of rape

BBC NewsJun-15-2026, 06:55:48 GMT

Norwegian crown princess's son found guilty of two counts of rape Marius Borg Høiby, the 29-year-old son of Norway's Crown Princess Mette-Marit, has been found guilty of two counts of rape and given four years in prison. The three judges in courtroom 250 at Oslo District Court cleared him of two other counts of rape, but found him guilty of many of the other offences of which he had been accused. Høiby was not in court for the verdict, but joined the session via video link. Prosecutors had called for Høiby to be given seven years and seven months in prison. His defence lawyers had called for a lesser term of 18 months and can appeal against the verdict.

artificial intelligence, football 2026, home news football 2026, (13 more...)

BBC News

Country:

North America (1.00)
Europe > United Kingdom (0.52)
Europe > Norway > Eastern Norway > Oslo (0.27)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > Europe Government (0.72)

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Neural Information Processing SystemsJun-15-2026, 06:40:04 GMT

Evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment. Reliably measuring abstract and complex phenomena such as'safety' and'robustness' requires strong construct validity, that is, having measures that represent what matters to the phenomenon. With a team of 29 expert reviewers, we conduct a systematic review of 445 LLM benchmarks from leading conferences in natural language processing and machine learning. Across the reviewed articles, we find patterns related to the measured phenomena, tasks, and scoring metrics which undermine the validity of the resulting claims. To address these shortcomings, we provide eight key recommendations and detailed actionable guidance to researchers and practitioners in developing LLM benchmarks.

computational linguistic, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.95)
Europe (0.92)
North America > Mexico > Mexico City (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.67)

Industry:

Law (1.00)
Education (1.00)
Government (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

19206a6ed5ed0aaeed440448dfc5cf7e-Paper-Conference.pdf

Neural Information Processing SystemsJun-15-2026, 06:26:03 GMT

LLM-agent systems often decompose high-level objectives into subtask dependency graphs, assuming that each subtask's output is reliable and conditionally independent of others given its parent responses. However, this assumption frequently breaks during execution, as ground-truth responses are inaccessible, leading to inter-agent misalignment--failures caused by inconsistencies and coordination breakdowns among agents [1]. To address this, we propose SEQCV, a dynamic framework for reliable execution under violated conditional independence. SEQCV executes subtasks sequentially, each conditioned on all prior verified responses, and performs consistency checks immediately after agents generate short token sequences. At each checkpoint, a token sequence is accepted only if it represents shared knowledge consistently supported across diverse LLM models; otherwise, it is discarded, triggering recursive subtask decomposition for finer-grained reasoning. Despite its sequential nature, SEQCV avoids repeated corrections on the same misalignment and achieves higher effective throughput than parallel pipelines. Across multiple reasoning and coordination tasks, SEQCV improves accuracy by up to 30% over existing LLM-agent systems.

artificial intelligence, large language model, natural language, (16 more...)

Neural Information Processing Systems

Country: