AITopics | Education

Collaborating Authors

Education

Adaptive Skills Adaptive Partitions (ASAP)

Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

Neural Information Processing SystemsJun-2-2025, 15:21:58 GMT

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.

artificial intelligence, hyperplane, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.14)
Europe > Spain (0.14)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Matrix Completion has No Spurious Local Minimum

Neural Information Processing SystemsJun-2-2025, 15:12:19 GMT

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for matrix completion has no spurious local minima --- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve matrix completion with \textit{arbitrary} initialization in polynomial time.

artificial intelligence, machine learning, spurious local minimum, (3 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Robust Conformal Prediction Using Privileged Information

Neural Information Processing SystemsJun-2-2025, 14:58:22 GMT

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

data mining, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.14)
North America > United States > California (0.14)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.45)
Health & Medicine > Public Health (0.45)
Education > Educational Setting (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Learning Goal-Conditioned Representations for Language Reward Models Jeff Da Yuntao Ma Hugh Zhang Spencer Whitehead Sean Hendryx

Neural Information Processing SystemsJun-2-2025, 14:55:58 GMT

Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning. Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback on language models. In this work, we propose training reward models (RMs) in a contrastive, goal-conditioned fashion by increasing the representation similarity of future states along sampled preferred trajectories and decreasing the similarity along randomly sampled dispreferred trajectories. This objective significantly improves reward model performance by up to 0.09 AUROC across challenging benchmarks, such as MATH and GSM8k. These findings extend to general alignment as well - on the Helpful-Harmless dataset, we observe 2.3% increase in accuracy.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Africa > Ethiopia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Leisure & Entertainment (0.93)
Education (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL Qin-Wen Luo, Ye-Wen Wang 1, Sheng-Jun Huang

Neural Information Processing SystemsJun-2-2025, 14:15:47 GMT

Offline-to-online (O2O) reinforcement learning (RL) provides an effective means of leveraging an offline pre-trained policy as initialization to improve performance rapidly with limited online interactions. Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method. To deal with this problem, we disclose that there are evaluation and improvement mismatches between the offline dataset and the online environment, which hinders the direct application of pre-trained policies to online fine-tuning. In this paper, we propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method. Before online fine-tuning, we re-evaluate the pessimistic critic trained on the offline dataset in an optimistic way and then calibrate the misaligned critic with the reliable offline actor to avoid erroneous update. After obtaining an optimistic and and aligned critic, we perform constrained fine-tuning to combat distribution shift during online learning. We show empirically that the proposed method can achieve stable and efficient performance improvement on multiple simulated tasks when compared to the state-of-the-art methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.27)
Asia > China > Jiangsu Province (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education > Educational Setting > Online (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

typos the reviewers note

Neural Information Processing SystemsJun-2-2025, 14:13:08 GMT

We first and foremost thank the reviewers for their valuable time and feedback. Reviewer #1 asks how our results relate to adaptive gradient methods. We are perhaps a bit imprecise in that we use "best linear Reviewer #1 correctly notes that the quadratic convexity of the constraint set is critical via Proposition 4. In the case Reviewer #1 asks about the origin and meaning of Corollary 3. It follows from Corollary 2 by lower bounding the We will include these new results. Reviewer #1 asks for definitional clarifications for minimax risk and regret. Reviewer #2 asks for applicability for the non-convex setting.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.56)

Industry: Education > Educational Setting > Online (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback

Fine-Tuning Personalization in Federated Learning to Mitigate Adversarial Clients EPFL Lausanne, Switzerland Lausanne, Switzerland Lausanne, Switzerland Nirupam Gupta

Neural Information Processing SystemsJun-2-2025, 14:01:06 GMT

Federated learning (FL) is an appealing paradigm that allows a group of machines (a.k.a.

artificial intelligence, heterogeneity, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Vaud > Lausanne (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Carlos Riquelme, Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy A. Mann, Andre Barreto, Gergely Neu

Neural Information Processing SystemsJun-2-2025, 13:53:14 GMT

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates. We demonstrate in a variety of policy evaluation tasks that this simple adaptive algorithm performs competitively with the best approach in hindsight, suggesting that learned confidence intervals are a powerful technique for adapting policy evaluation to use TD or MC returns in a data-driven way.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Diffusion Improves Graph Learning

Neural Information Processing SystemsJun-2-2025, 13:21:35 GMT

Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g.

artificial intelligence, graph, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Neural Information Processing SystemsJun-2-2025, 13:13:49 GMT

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. We break down the problem into two causes: concept ignorance and concept mismapping. To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the imageto-text concept matching mechanism. Firstly, we introduce a novel image-totext concept activation module to guide the diffusion model in revisiting ignored concepts. Additionally, an attribute concentration module is proposed to map the text conditions of each entity to its corresponding image area correctly. Extensive experimental evaluations, conducted across three distinct text-to-image alignment benchmarks, demonstrate the superior efficacy of our proposed method, CoMat-SDXL, over the baseline model, SDXL [49]. We also show that our method enhances general condition utilization capability and generalizes to the long and complex prompt despite not specifically training on it. The code is available at https://github.com/CaraJ7/CoMat.

artificial intelligence, diffusion model, machine learning, (18 more...)

Neural Information Processing Systems

Country: