AITopics | freedman

Collaborating Authors

freedman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2c15b0221da28bc6f4373a7e78b896dd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 05:06:06 GMT

freedman, inequality, log 3, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

2c15b0221da28bc6f4373a7e78b896dd-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:57:07 GMT

freedman, inequality, log 3, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds

Metelli, Alberto Maria, Drago, Simone, Mussi, Marco

arXiv.org Machine LearningAug-5-2025

We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function $f^*$ belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) noise model whose mean is a non-linear function $μ(f^*)$. This model extends both kernelized bandits (KBs) and generalized linear bandits (GLBs). We propose an optimistic algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality resorting to Freedman's inequality and a stitching argument, which represents a contribution of independent interest. Based on it, we conduct a regret analysis of GKB-UCB, deriving a regret bound of order $\widetilde{O}( γ_T \sqrt{T/κ_*})$, being $T$ the learning horizon, $γ_T$ the maximal information gain, and $κ_*$ a term characterizing the magnitude the reward nonlinearity. Our result matches, up to multiplicative constants and logarithmic terms, the state-of-the-art bounds for both KBs and GLBs and provides a unified view of both settings.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

2508.01681

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tracking Most Significant Shifts in Infinite-Armed Bandits

Suk, Joe, Kim, Jung-hun

arXiv.org Machine LearningJan-31-2025

We study an infinite-armed bandit problem where actions' mean rewards are initially sampled from a reservoir distribution. Most prior works in this setting focused on stationary rewards (Berry et al., 1997; Wang et al., 2008; Bonald and Proutiere, 2013; Carpentier and Valko, 2015) with the more challenging adversarial/non-stationary variant only recently studied in the context of rotting/decreasing rewards (Kim et al., 2022; 2024). Furthermore, optimal regret upper bounds were only achieved using parameter knowledge of non-stationarity and only known for certain regimes of regularity of the reservoir. This work shows the first parameter-free optimal regret bounds for all regimes while also relaxing distributional assumptions on the reservoir. We first introduce a blackbox scheme to convert a finite-armed MAB algorithm designed for near-stationary environments into a parameter-free algorithm for the infinite-armed non-stationary problem with optimal regret guarantees. We next study a natural notion of significant shift for this problem inspired by recent developments in finite-armed MAB (Suk & Kpotufe, 2022). We show that tighter regret bounds in terms of significant shifts can be adaptively attained by employing a randomized variant of elimination within our blackbox scheme. Our enhanced rates only depend on the rotting non-stationarity and thus exhibit an interesting phenomenon for this problem where rising rewards do not factor into the difficulty of non-stationarity.

data mining, log 3, machine learning, (17 more...)

arXiv.org Machine Learning

2502.00108

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
(2 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Near Minimax-Optimal Distributional Temporal Difference Algorithms and The Freedman Inequality in Hilbert Spaces

Peng, Yang, Zhang, Liangyu, Zhang, Zhihua

arXiv.org Machine LearningMar-14-2024

Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One of the core tasks in the field of DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$. The distributional temporal difference (TD) algorithm has been accordingly proposed, which is an extension of the temporal difference algorithm in the classic RL literature. In the tabular case, \citet{rowland2018analysis} and \citet{rowland2023analysis} proved the asymptotic convergence of two instances of distributional TD, namely categorical temporal difference algorithm (CTD) and quantile temporal difference algorithm (QTD), respectively. In this paper, we go a step further and analyze the finite-sample performance of distributional TD. To facilitate theoretical analysis, we propose a non-parametric distributional TD algorithm (NTD). For a $\gamma$-discounted infinite-horizon tabular Markov decision process, we show that for NTD we need $\tilde{O}\left(\frac{1}{\varepsilon^{2p}(1-\gamma)^{2p+1}}\right)$ iterations to achieve an $\varepsilon$-optimal estimator with high probability, when the estimation error is measured by the $p$-Wasserstein distance. This sample complexity bound is minimax optimal (up to logarithmic factors) in the case of the $1$-Wasserstein distance. To achieve this, we establish a novel Freedman's inequality in Hilbert spaces, which would be of independent interest. In addition, we revisit CTD, showing that the same non-asymptotic convergence bounds hold for CTD in the case of the $p$-Wasserstein distance.

freedman, inequality, log 3, (17 more...)

arXiv.org Machine Learning

2403.05811

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Ji, Xiang, Li, Gen

arXiv.org Machine LearningDec-12-2023

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2305.15546

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation

Liu, Zijian, Zhou, Zhengyuan

arXiv.org Artificial IntelligenceMay-20-2023

Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i.e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks. Under this challenging assumption, lots of new progress has been made for either convex or nonconvex problems, however, most of which only consider smooth objectives. In contrast, people have not fully explored and well understood this problem when functions are nonsmooth. This paper aims to fill this crucial gap by providing a comprehensive analysis of stochastic nonsmooth convex optimization with heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas, which is only proved to converge in expectation but under the additional strong convexity assumption. Under appropriate choices of parameters, for both convex and strongly convex functions, we not only establish the first high-probability rates but also give refined in-expectation bounds compared with existing works. Remarkably, all of our results are optimal (or nearly optimal up to logarithmic factors) with respect to the time horizon $T$ even when $T$ is unknown in advance. Additionally, we show how to make the algorithm parameter-free with respect to $\sigma$, in other words, the algorithm can still guarantee convergence without any prior knowledge of $\sigma$. Furthermore, an initial distance adaptive convergence rate is provided if $\sigma$ is assumed to be known.

artificial intelligence, log 4, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.12277

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Freedman

AAAI ConferencesFeb-8-2022, 11:11:11 GMT

Referred to as an approach for either plan or goal recognition, the original method proposed by Ramirez and Geffner introduced a domain-based approach that did not need a library containing specific plan instances. This introduced a more generalizable means of representing tasks to be recognized, but was also very slow due to its need to run simulations via multiple executions of an off-the-shelf classical planner. Several variations have since been proposed for quicker recognition, but each one uses a drastically different approach that must sacrifice other qualities useful for processing the recognition results in more complex systems. We present work in progress that takes advantage of the shared state space between planner executions to perform multiple goal heuristic search. This single execution of a planner will potentially speed up the recognition process using the original method, which also maintains the sacrificed properties and improves some of the assumptions made by Ramirez and Geffner.

execution, freedman, ramirez and geffner, (2 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Freedman

AAAI ConferencesFeb-8-2022, 11:10:48 GMT

Planning is one of the oldest areas of research within artificial intelligence, studying the selection of actions for accomplishing goals. The more recently established areas of plan, activity, and intent recognition instead study an agent's behavior and task(s) given observations of its chosen actions. While these areas have been independently studied and applied to games in the past for both understanding player behavior and developing game characters, the potential for their integration presents even more opportunities via adaptive interaction with the player. In this manuscript, we discuss recent research on the integration of these areas and investigate potential uses for such integrated systems in games.

freedman

AAAI Conferences

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

Filters

Collaborating Authors

freedman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

2c15b0221da28bc6f4373a7e78b896dd-Paper-Conference.pdf

2c15b0221da28bc6f4373a7e78b896dd-Paper-Conference.pdf

b2ea5e977c5fc1ccfa74171a9723dd61-Supplemental.pdf

Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds

Tracking Most Significant Shifts in Infinite-Armed Bandits

Near Minimax-Optimal Distributional Temporal Difference Algorithms and The Freedman Inequality in Hilbert Spaces

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation

Freedman

Freedman