AITopics | logt

Collaborating Authors

logt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Calibration in Non-Stationary Environments

Liu, Junyan, Luo, Haipeng, Ratliff, Lillian J.

arXiv.org Machine LearningMay-25-2026

Making calibrated online predictions is a central challenge in modern AI systems. Much of the existing literature focuses on fully adversarial environments where outcomes may be arbitrary, leading to conservative algorithms that can perform suboptimally in more benign settings, such as when outcomes are nearly stationary. This gap raises a natural question: can we design online prediction algorithms whose calibration error automatically adapts to the degree of non-stationarity in the environment, smoothly interpolating between i.i.d. and adversarial regimes? We answer this question in the affirmative and develop a suite of algorithms that achieve adaptive calibration guarantees under multiple calibration measures. Specifically, with $T$ being the number of rounds, $K$ being the unknown number of i.i.d. segments of the environment, and $C\in[0,T]$ being another unknown non-stationary measure defined as the minimal $\ell_1$ deviation of the mean outcomes, our algorithms attain $\widetilde{O}(\min\{\sqrt{T}+(TC)^{\frac{1}{3}}, \sqrt{KT}\})$ for $\ell_1$ calibration error and $\widetilde{O}(\min\{(1+C)^{\frac{1}{3}}, K\})$ for both $\ell_2$ and pseudo KL calibration error. These bounds match the optimal rates in the stationary case ($C=0$ and $K=1$) and recover known guarantees in the fully adversarial regime ($C, K=Ω(T)$). Our approach builds on and extends prior work [Hu et al., 2026, Luo et al., 2025], introducing an epoch-based scheduling together with a novel non-uniform partition of the prediction space that allocates finer resolution near the underlying ground truth.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.1149

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

Sarkar, Dhruv, Sinha, Abhishek

arXiv.org Machine LearningMay-21-2026

We consider Constrained Online Convex Optimization (COCO) with adversarially chosen constraints. At each round, the learner chooses an action before observing the loss and constraint function for that round. The goal is to achieve small static regret against the best point satisfying all constraints while also controlling cumulative constraint violation ($\mathsf{CCV}$). For strongly convex losses, state-of-the-art algorithms achieve $O(\log T)$ regret and $O(\sqrt{T \log T})$ $\mathsf{CCV}.$ The corresponding best-known bounds for convex losses is $O(\sqrt{T})$ regret and $O(\sqrt{T} \log T)$ $\mathsf{CCV}$. In this paper, we give a simple projection-based algorithm that simultaneously achieves $O(\log T)$ regret and $O(\log T)$ $\mathsf{CCV}$ for strongly-convex losses, yielding an exponential improvement in the $\mathsf{CCV}$. For the convex losses, our algorithm improves the $\mathsf{CCV}$ to $O(\sqrt{T})$ while maintaining the optimal $O(\sqrt{T})$ regret. The key to our improvement is a recent geometric result for self-contracted curves, which may be of independent interest.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.21107

Country: Asia > India (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)

Add feedback

NeurIPS2021_ImperfectCommmunicationBandits

Madhu

Neural Information Processing SystemsApr-25-2026, 14:38:06 GMT

We consider the case where each message fails with probability 1 p and each agent i uses the messages it receives from its neighbors with probability pi.This is equivalent to each agent ireceiving messages from its neighbors with probability pip.Let 1{(i,j) 2 Et}be the indicator random variable that takes value 1 if agent i receives reward value and arm id from agent j at time t and 0 otherwise. We start by proving some useful lemmas. Lemma 1. (Restatement of results from [3]) Let k = Thus we have P Ai(t+1) = k,Nik(t) > k P bµi1(t) µ1 Ci1(t) +P bµik(t) µk +Cik(t) This concludes the proof of Lemma 1. Lemma 2. Let (G) is the clique covering number of graph G. Let k = Let C be a non overlapping clique covering of G. Then we have that k |C| < Nik( ik,C) k. From regret results it follows that regret for this case is greater than the regret for the case where ik,C < k,C for some (or all) i. 13 We analyse the expected number of times agents pull suboptimal arm k as follows, X P bµi1(t) µ1 Ci1(t) +P bµik(t) µk +Cik(t), (29) where (a) follows from the fact that clique covering is non overlapping. This concludes the proof of Lemma 2. Lemma 3. Let di(G) be the degree of agent i in graph G.

agent, artificial intelligence, nik, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

A Kernel Nonconformity Score for Multivariate Conformal Prediction

Meyer, Louis, Xu, Wenkai

arXiv.org Machine LearningApr-24-2026

Multivariate conformal prediction requires nonconformity scores that compress residual vectors into scalars while preserving certain implicit geometric structure of the residual distribution. We introduce a Multivariate Kernel Score (MKS) that produces prediction regions that explicitly adapt to this geometry. We show that the proposed score resembles the Gaussian process posterior variance, unifying Bayesian uncertainty quantification with the coverage guarantees of frequentist-type. Moreover, the MKS can be decomposed into an anisotropic Maximum Mean Discrepancy (MMD) that interpolates between kernel density estimation and covariance-weighted distance. We prove finite-sample coverage guarantees and establish convergence rates that depend on the effective rank of the kernel-based covariance operator rather than the ambient dimension, enabling dimension-free adaptation. On regression tasks, the MKS reduces the volume of prediction region significantly, compared to ellipsoidal baselines while maintaining nominal coverage, with larger gains at higher dimensions and tighter coverage levels.

artificial intelligence, density 0, machine learning, (16 more...)

arXiv.org Machine Learning

2604.21595

Country: North America > United States (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees

Gerogiannis, Argyrios, Huang, Yu-Han, Veeravalli, Venugopal V.

arXiv.org Machine LearningApr-21-2026

We study model-free reinforcement learning (RL) in non-stationary finite-horizon episodic Markov decision processes (MDPs) without prior knowledge of the non-stationarity. We focus on the piecewise-stationary (PS) setting, where both the reward and transition dynamics can change an arbitrary number of times. We propose Detection Augmented Reinforcement Learning (DARLING), a modular wrapper for PS-RL that applies to both tabular and linear MDPs, without knowledge of the changes. Under certain change-point separation and reachability conditions, DARLING improves the best available dynamic regret bounds in both settings and yields strong empirical performance. We further establish the first minimax lower bounds for PS-RL in tabular and linear MDPs, showing that DARLING is the first nearly optimal algorithm. Experiments on standard benchmarks demonstrate that DARLING consistently surpasses the state-of-the-art methods across diverse non-stationary scenarios.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2604.16684

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

Guan, Changkun, Xu, Mengfan

arXiv.org Machine LearningApr-9-2026

Multi-objective bandits have attracted increasing attention because of their broad applicability and mathematical elegance, where the reward of each arm is a multi-dimensional vector rather than a scalar. This naturally introduces Pareto order relations and Pareto regret. A long-standing question in this area is whether performance is fundamentally harder to optimize because of this added complexity. A recent surprising result shows that, in the adversarial setting, Pareto regret is no larger than classical regret; however, in the stochastic setting, where the regret notion is different, the picture remains unclear. In fact, existing work suggests that Pareto regret in the stochastic case increases with the dimensionality. This controversial yet subtle phenomenon motivates our central question: \emph{are multi-objective bandits actually harder than single-objective ones?} We answer this question in full by showing that, in the stochastic setting, Pareto regret is in fact governed by the maximum sub-optimality gap $g^\dagger$, and hence by the minimum marginal regret of order $Ω(\frac{K\log T}{g^\dagger})$. We further develop a new algorithm that achieves Pareto regret of order $O(\frac{K\log T}{g^\dagger})$, and is therefore optimal. The algorithm leverages a nested two-layer uncertainty quantification over both arms and objectives through upper and lower confidence bound estimators. It combines a top-two racing strategy for arm selection with an uncertainty-greedy rule for dimension selection. Together, these components balance exploration and exploitation across the two layers. We also conduct comprehensive numerical experiments to validate the proposed algorithm, showing the desired regret guarantee and significant gains over benchmark methods.

artificial intelligence, optimization problem, pareto regret, (16 more...)

arXiv.org Machine Learning

2604.07096

Country: North America > United States (0.05)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Huang, Ruiyuan, Lyu, Zicheng, Zhu, Xiaoyi, Huang, Zengfeng

arXiv.org Machine LearningMar-31-2026

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ω(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$, even under adaptive grids. In particular, logarithmic memory rules out $O(K^{1-\varepsilon})$ batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ω(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm that, for any bit budget $W$ with $Ω(\log T) \le W \le O(K\log T)$, uses at most $W$ bits of memory and $\widetilde{O}(K/W)$ batches while achieving regret $\widetilde{O}(\sqrt{KT})$, nearly matching our lower bound up to polylogarithmic factors.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2603.13742

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Breaking the $O(\sqrt{T})$ Cumulative Constraint Violation Barrier while Achieving $O(\sqrt{T})$ Static Regret in Constrained Online Convex Optimization

Balasundaram, Haricharan, Mahendran, Karthick Krishna, Vaze, Rahul

arXiv.org Machine LearningMar-24-2026

The problem of constrained online convex optimization is considered, where at each round, once a learner commits to an action $x_t \in \mathcal{X} \subset \mathbb{R}^d$, a convex loss function $f_t$ and a convex constraint function $g_t$ that drives the constraint $g_t(x)\le 0$ are revealed. The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV) compared to the benchmark that knows the loss functions and constraint functions $f_t$ and $g_t$ for all $t$ ahead of time, and chooses a static optimal action that is feasible with respect to all $g_t(x)\le 0$. In recent prior work Sinha and Vaze [2024], algorithms with simultaneous regret of $O(\sqrt{T})$ and CCV of $O(\sqrt{T})$ or (CCV of $O(1)$ in specific cases Vaze and Sinha [2025], e.g. when $d=1$) have been proposed. It is widely believed that CCV is $Ω(\sqrt{T})$ for all algorithms that ensure that regret is $O(\sqrt{T})$ with the worst case input for any $d\ge 2$. In this paper, we refute this and show that the algorithm of Vaze and Sinha [2025] simultaneously achieves regret of $O(\sqrt{T})$ regret and CCV of $O(T^{1/3})$ when $d=2$.

artificial intelligence, constraint-based reasoning, perimeter, (17 more...)

arXiv.org Machine Learning

2603.20671

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.35)

Add feedback

Constrained Online Convex Optimization with Memory and Predictions

Abdullah, Mohammed, Iosifidis, George, Elayoubi, Salah Eddine, Chahed, Tijani

arXiv.org Machine LearningMar-24-2026

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.

artificial intelligence, constraint, prediction, (15 more...)

arXiv.org Machine Learning

doi: 10.1609/aaai.v40i24.39031

2603.21375

Country:

Europe > France (0.14)
Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)

Add feedback

Batched Kernelized Bandits: Refinements and Extensions

Ma, Chenkai, Chen, Keqin, Scarlett, Jonathan

arXiv.org Machine LearningMar-16-2026

In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that $B=O(\log\log T)$ batches suffice to attain near-optimal regret, where $T$ is the time horizon and $B$ is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within $1+o(1)$), and (ii) removing a factor of $B$ in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2603.12627

Country:

Asia > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback