AITopics | tsde

Collaborating Authors

tsde

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalar Posterior Sampling with Applications

Georgios Theocharous, Zheng Wen, Yasin Abbasi Yadkori, Nikos Vlassis

Neural Information Processing SystemsFeb-14-2026, 10:31:18 GMT

Peter L learning UAI, pages Dimitri Dynamic, Belmont, Ronen I optimal Journal Aditya processes.

artificial intelligence, machine learning, ouyangetal, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Dense-ExponentialRandomFeatures: SharpPositive EstimatorsoftheGaussianKernel

Neural Information Processing SystemsFeb-7-2026, 06:54:11 GMT

Figure 1: (left) Venn diagram of the new types of random features (green) we propose.(right)

artificial intelligence, logdet, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > Queensland (0.04)
(10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, Rahul Jain

Neural Information Processing SystemsNov-21-2025, 08:57:50 GMT

A naive approach to an unknown model is the certainty equivalence principle .

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)

Add feedback

Scalar Posterior Sampling with Applications

Georgios Theocharous, Zheng Wen, Yasin Abbasi Yadkori, Nikos Vlassis

Neural Information Processing SystemsNov-20-2025, 19:46:42 GMT

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Reviews: Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Neural Information Processing SystemsOct-7-2024, 21:32:00 GMT

The paper proposes TSDE, a posterior sampling algorithm for RL in the average reward infinite horizon setting. This algorithm uses dynamic episodes but unlike Lazy-PSRL avoids technical issues by not only terminating an episode when an observation count doubled but also terminating episodes when they become too long. This ensures that the episode length cannot grow faster than linear and ultimately a Bayesian regret bound of O(HS(AT) .5) is shown. Posterior sampling methods typically outperform UCB-type algorithms and therefore a posterior sampling algorithm for non-episodic RL with rigorous regret bounds is desirable. This paper proposes such an algorithm, which is of high interest.

algorithm, learning unknown markov decision process, thompson sampling approach, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, Rahul Jain

Neural Information Processing SystemsOct-3-2024, 10:11:27 GMT

algorithm, mdp, tsde, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask

Senane, Zineb, Cao, Lele, Buchner, Valentin Leonhard, Tashiro, Yusuke, You, Lei, Herman, Pawel, Nordahl, Mats, Tu, Ruibo, von Ehrenheim, Vilhelm

arXiv.org Artificial IntelligenceJun-17-2024

Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.

dataset, representation, tsde, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3637528.3671673

2405.05959

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.45)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

Gagrani, Mukul, Sudhakara, Sagar, Mahajan, Aditya, Nayyar, Ashutosh, Ouyang, Yi

arXiv.org Artificial IntelligenceAug-19-2021

We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $\tilde{\mathcal{O}}(\sqrt{T})$, where $T$ is the time-horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in~$T$.

algorithm, assumption, min 1, (13 more...)

arXiv.org Artificial Intelligence

2108.08502

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

Thompson Sampling in Non-Episodic Restless Bandits

Jung, Young Hun, Abeille, Marc, Tewari, Ambuj

arXiv.org Machine LearningOct-12-2019

Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging. We study learning algorithms over the unknown reward distributions and prove a sub-linear, $O(\sqrt{T}\log T)$, regret bound for a variant of Thompson sampling. Our analysis applies in the infinite time horizon setting, resolving the open question raised by Jung and Tewari (2019) whose analysis is limited to the episodic case. We adopt their policy mapping framework, which allows our algorithm to be efficient and simultaneously keeps the regret meaningful. Our algorithm adapts the TSDE algorithm of Ouyang et al. (2017) in a non-trivial manner to account for the special structure of restless bandits. We test our algorithm on a simulated dynamic channel access problem with several policy mappings, and the empirical regrets agree with the theoretical bound regardless of the choice of the policy mapping.

algorithm, bellman equation, cond, (13 more...)

arXiv.org Machine Learning

1910.05654

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Ouyang, Yi, Gagrani, Mukul, Nayyar, Ashutosh, Jain, Rahul

Neural Information Processing SystemsDec-31-2017

We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the optimal stationary policy for the sampled model for the rest of the episode. The duration of each episode is dynamically determined by two stopping criteria. The first stopping criterion controls the growth rate of episode length. The second stopping criterion happens when the number of visits to any state-action pair is doubled. We establish $\tilde O(HS\sqrt{AT})$ bounds on expected regret under a Bayesian setting, where $S$ and $A$ are the sizes of the state and action spaces, $T$ is time, and $H$ is the bound of the span. This regret bound matches the best available bound for weakly communicating MDPs. Numerical results show it to perform better than existing algorithms for infinite horizon MDPs.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.47)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback