AITopics

Country:

Europe > France (0.28)
North America > Canada (0.28)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsFeb-17-2026, 17:27:10 GMT

Covariance adaptive best arm identification

The learner's goal is to identify the optimal arm

data mining, machine learning, nullk null, (20 more...)

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsFeb-17-2026, 17:27:07 GMT

e82ef7865f29b40640f486bbbe7959a7-Paper-Conference.pdf

artificial intelligence, data mining, machine learning, (19 more...)

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsOct-9-2025, 10:30:32 GMT

Covariance adaptive best arm identification

The learner's goal is to identify the optimal arm

artificial intelligence, data mining, machine learning, (19 more...)

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsAug-13-2025, 16:23:36 GMT

A Discussion on Hyper parameter Tuning

Contextual bandit is a class of online learning problems that can be viewed as a simple reinforcement learning problem without transition. For a completely understanding of contextual bandit problems, we refer the readers to the Chapter 4 of [Bubeck et al., 2012]. Here we include the main idea for completeness. In contextual bandit problems, the agent needs to find out the best action given some observed context (a.k.a the optimal policy in reinforcement learning). Formally, we define S as the context set and K as the number of action.

machine learning, reinforcement learning, srld, (16 more...)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Boudart, Pierre, Gaillard, Pierre, Rudi, Alessandro

Enjoying Non-linearity in Multinomial Logistic Bandits

arXiv.org Machine LearningJul-9-2025

We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille et al., 2021). They introduced a problem-dependent constant $κ_*$, that may be exponentially large in some problem parameters and which is captured by the derivative of the sigmoid function. It encapsulates the non-linearity and improves existing regret guarantees over $T$ rounds from $\smash{O(d\sqrt{T})}$ to $\smash{O(d\sqrt{T/κ_*})}$, where $d$ is the dimension of the parameter space. We extend their analysis to the multinomial logistic bandit framework, making it suitable for complex applications with more than two choices, such as reinforcement learning or recommender systems. To achieve this, we extend the definition of $κ_*$ to the multinomial setting and propose an efficient algorithm that leverages the problem's non-linearity. Our method yields a problem-dependent regret bound of order $ \smash{\widetilde{\mathcal{O}}( Kd \sqrt{{T}/{κ_*}})} $, where $K$ is the number of actions and $κ_* \ge 1$. This improves upon the best existing guarantees of order $ \smash{\widetilde{\mathcal{O}}( Kd \sqrt{T} )} $. Moreover, we provide a $\smash{ Ω(d\sqrt{T/κ_*})}$ lower-bound, showing that our dependence on $κ_*$ is optimal.

artificial intelligence, exp, machine learning, (15 more...)

2507.05306

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Drago, Simone, Mussi, Marco, Metelli, Alberto Maria

A Refined Analysis of UCBVI

arXiv.org Machine LearningFeb-24-2025

In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.

equation, nullk null, refined analysis, (16 more...)

2502.1737

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Cape, Joshua, Yu, Xianshi, Liao, Jonquil Z.

Robust spectral clustering with rank statistics

arXiv.org Machine LearningAug-19-2024

This paper analyzes the statistical performance of a robust spectral clustering method for latent structure recovery in noisy data matrices. We consider eigenvector-based clustering applied to a matrix of nonparametric rank statistics that is derived entrywise from the raw, original data matrix. This approach is robust in the sense that, unlike traditional spectral clustering procedures, it can provably recover population-level latent block structure even when the observed data matrix includes heavy-tailed entries and has a heterogeneous variance profile. Our main theoretical contributions are threefold and hold under flexible data generating conditions. First, we establish that robust spectral clustering with rank statistics can consistently recover latent block structure, viewed as communities of nodes in a graph, in the sense that unobserved community memberships for all but a vanishing fraction of nodes are correctly recovered with high probability when the data matrix is large. Second, we refine the former result and further establish that, under certain conditions, the community membership of any individual, specified node of interest can be asymptotically exactly recovered with probability tending to one in the large-data limit. Third, we establish asymptotic normality results associated with the truncated eigenstructure of matrices whose entries are rank statistics, made possible by synthesizing contemporary entrywise matrix perturbation analysis with the classical nonparametric theory of so-called simple linear rank statistics. Collectively, these results demonstrate the statistical utility of rank-based data transformations when paired with spectral techniques for dimensionality reduction. Additionally, for a dataset of human connectomes, our approach yields parsimonious dimensionality reduction and improved recovery of ground-truth neuroanatomical cluster structure.

matrix, spectral, statistics, (15 more...)

2408.10136

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Saad, El Mehdi, Blanchard, G.

Constant regret for sequence prediction with limited advice

arXiv.org Machine LearningOct-5-2022

We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O($\sqrt$ KT). We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only p = 2 experts per round for prediction and observing m $\ge$ 2 experts' losses. Its randomized regret (wrt. internal randomization of the learners' strategy) is of order O (K/m) log(K$\delta$ --1) with probability 1 -- $\delta$, i.e., is independent of the horizon T ("constant" or "fast rate" regret) if (p $\ge$ 2 and m $\ge$ 3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p = m = 2, we provide an upper bound of order O(K 2 log(K$\delta$ --1)), with probability 1 -- $\delta$. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameter $\delta$. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" $\Omega$($\sqrt$ KT), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.

data mining, machine learning, nullk null, (20 more...)

2210.02256

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.67)