AITopics | agrawal and goyal

Collaborating Authors

agrawal and goyal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

Li, Bochao, Fu, Yao, Chen, Wei, Kong, Fang

arXiv.org Machine LearningMay-15-2026

Offline-to-online learning aims to improve online decision-making by leveraging offline logged data. A central challenge in this setting is the distribution shift between offline and online environments. While some existing works attempt to leverage shifted offline data, they largely rely on UCB-type algorithms. Thompson sampling (TS) represents another canonical class of bandit algorithms, well known for its strong empirical performance and naturally suited to offline-to-online learning through its Bayesian formulation. However, unlike UCB indices, posterior samples in TS are not guaranteed to be optimistic with respect to the true arm means. This makes indices constructed from purely online and hybrid data difficult to compare and complicates their use. To address this issue, we propose sample-mean anchored TS (Anchor-TS), which introduces a novel median-based anchoring rule that defines the arm index as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean. The median anchoring systematically corrects bias induced by distribution shift by mitigating over-estimation for suboptimal arms and under-estimation for optimal arms, while exploiting offline information to obtain more accurate estimates when the shift is small. We establish theoretical guarantees showing that the proposed algorithm safely leverages offline data to accelerate online learning, and quantifying how the degree of distribution shift and the size of offline data affect the resulting regret reduction. Extensive experiments demonstrate consistent improvements of our algorithm over baselines.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2605.10289

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Neural Information Processing SystemsApr-27-2026, 01:58:01 GMT

Linear contextual bandits represent a fundamental class of models with numerous real-world applications, and it is critical to developing algorithms that can effectively manage noise with unknown variance, ensuring provable guarantees for both worst-case constant-variance noise and deterministic reward scenarios.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

56f0b515214a7ec9f08a4bbf9a56f7ba-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 00:26:07 GMT

artificial intelligence, batch, inequality, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback

Spectral Thompson sampling

Kocak, Tomas, Valko, Michal, Munos, Remi, Agrawal, Shipra

arXiv.org Machine LearningApr-16-2026

Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2604.13739

Country:

Europe > France (0.05)
North America > United States (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Add feedback

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Neural Information Processing SystemsFeb-19-2026, 10:21:24 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

4a6824f8f137e78f18e73d9cfc1d22ed-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 10:21:21 GMT

algorithm, bandit, inequality, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

Neural Information Processing SystemsFeb-8-2026, 16:37:07 GMT

Partial monitoring (PM) is a general sequential decision-making problem with limited feedback ( Rus-tichini, 1999; Piccolboni and Schindelhauer, 2001).

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

4a6824f8f137e78f18e73d9cfc1d22ed-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 15:16:37 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > United States > New York (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

Neural Information Processing SystemsOct-3-2025, 02:28:05 GMT

Partial monitoring (PM) is a general sequential decision-making problem with limited feedback ( Rus-tichini, 1999; Piccolboni and Schindelhauer, 2001).

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Hu, Bingshan, Huang, Zhiming, Zhang, Tianyue H., Lécuyer, Mathias, Hegde, Nidhi

arXiv.org Machine LearningMay-2-2024

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$\alpha$) and Thompson Sampling with Timestamp Duelling (TS-TD-$\alpha$), where $\alpha \in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $\Delta$ denotes the single round performance loss when pulling a sub-optimal arm.

algorithm, optimal arm, posterior sample, (14 more...)

arXiv.org Machine Learning

2405.0101

Country: