AITopics | Big Data

Collaborating Authors

Big Data

News Overviews Instructional Materials AI-Alerts Classics

Learning Versatile Skills with Curriculum Masking Yao Tang 1 Zichuan Lin 2 Deheng Ye2

Neural Information Processing SystemsJun-2-2025, 12:57:13 GMT

Masked prediction has emerged as a promising pretraining paradigm in offline reinforcement learning (RL) due to its versatile masking schemes, enabling flexible inference across various downstream tasks with a unified model. Despite the versatility of masked prediction, it remains unclear how to balance the learning of skills at different levels of complexity. To address this, we propose CurrMask, a curriculum masking pretraining paradigm for sequential decision making. Motivated by how humans learn by organizing knowledge in a curriculum, CurrMask adjusts its masking scheme during pretraining for learning versatile skills. Through extensive experiments, we show that CurrMask exhibits superior zero-shot performance on skill prompting tasks, goal-conditioned planning tasks, and competitive finetuning performance on offline RL tasks. Additionally, our analysis of training dynamics reveals that CurrMask gradually acquires skills of varying complexity by dynamically adjusting its masking scheme. Code is available at here.

large language model, machine learning, reinforcement learning, (22 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (1.00)
Workflow (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Query-Efficient Correlation Clustering with Noisy Oracle

Neural Information Processing SystemsJun-2-2025, 09:04:17 GMT

We study a general clustering setting in which we have n elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.14)
Europe > Spain (0.14)
Asia > Japan (0.14)
Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Information Technology (0.46)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Fairness in Learning: Classic and Contextual Bandits

Matthew Joseph, Michael Kearns, Jamie H. Morgenstern, Aaron Roth

Neural Information Processing SystemsJun-2-2025, 08:07:01 GMT

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on "chained" confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Combinatorial semi-bandit with known covariance

Rémy Degenne, Vianney Perchet

Neural Information Processing SystemsJun-2-2025, 07:58:15 GMT

The combinatorial stochastic semi-bandit problem is an extension of the classical multi-armed bandit problem in which an algorithm pulls more than one arm at each stage and the rewards of all pulled arms are revealed. One difference with the single arm variant is that the dependency structure of the arms is crucial. Previous works on this setting either used a worst-case approach or imposed independence of the arms. We introduce a way to quantify the dependency structure of the problem and design an algorithm that adapts to it. The algorithm is based on linear regression and the analysis develops techniques from the linear bandit literature. By comparing its performance to a new lower bound, we prove that it is optimal, up to a poly-logarithmic factor in the number of pulled arms.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Batched Best Arm Identification

Neural Information Processing SystemsJun-2-2025, 07:41:41 GMT

We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability 1 δ for some small constant δ > 0 while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i.e., δ 0) and runs in 3 batches in expectation. Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i.e., δ is finite), while enjoying the same batch and sample complexity as Tri-BBAI when δ tends to zero. Moreover, in the non-asymptotic setting, the complexity of previous batch algorithms is usually conditioned on the event that the best arm is returned (with a probability of at least 1 δ), which is potentially unbounded in cases where a sub-optimal arm is returned. In contrast, the complexity of Opt-BBAI does not rely on such an event. This is achieved through a novel procedure that we design for checking whether the best arm is eliminated, which is of independent interest.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Causal Bandits: Learning Good Interventions via Causal Inference

Finnian Lattimore, Tor Lattimore, Mark D. Reid

Neural Information Processing SystemsJun-2-2025, 05:57:09 GMT

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)
Information Technology > Data Science > Data Mining > Big Data (0.53)

Add feedback

Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits

Neural Information Processing SystemsJun-2-2025, 05:43:57 GMT

We propose a novel piecewise stationary linear bandit (PSLB) model, where the environment randomly samples a context from an unknown probability distribution at each changepoint, and the quality of an arm is measured by its return averaged over all contexts. The contexts and their distribution, as well as the changepoints are unknown to the agent.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.27)

Genre: Research Report > Experimental Study (0.92)

Industry:

Banking & Finance > Trading (0.45)
Food & Agriculture > Agriculture (0.45)
Banking & Finance > Economy (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)

Add feedback

RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions

Neural Information Processing SystemsJun-2-2025, 05:28:53 GMT

Mobile health leverages personalized and contextually tailored interventions optimized through bandit and reinforcement learning algorithms. In practice, however, challenges such as participant heterogeneity, nonstationarity, and nonlinear relationships hinder algorithm performance. We propose RoME, a Robust Mixed-Effects contextual bandit algorithm that simultaneously addresses these challenges via (1) modeling the differential reward with user-and time-specific random effects, (2) network cohesion penalties, and (3) debiased machine learning for flexible estimation of baseline rewards. We establish a high-probability regret bound that depends solely on the dimension of the differential-reward model, enabling us to achieve robust regret bounds even when the baseline reward is highly complex. We demonstrate the superior performance of the RoME algorithm in a simulation and two off-policy evaluation studies.

data mining, machine learning, reinforcement learning, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Double Thompson Sampling for Dueling Bandits

Huasen Wu, Xin Liu

Neural Information Processing SystemsJun-2-2025, 04:43:33 GMT

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems. As its name suggests, D-TS selects both the first and the second candidates according to Thompson Sampling. Specifically, D-TS maintains a posterior distribution for the preference matrix, and chooses the pair of arms for comparison according to two sets of samples independently drawn from the posterior distribution. This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as a special case.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Europe > Spain (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Improved Algorithms for Contextual Dynamic Pricing Solenne Gaucher

Neural Information Processing SystemsJun-2-2025, 04:43:04 GMT

In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Buyers will purchase products only if the prices are below their valuations. The goal of the seller is to design a pricing strategy that collects as much revenue as possible. We focus on two different valuation models. The first assumes that valuations linearly depend on the context and are further distorted by noise.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology: