Goto

Collaborating Authors

 Markov Models



Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Neural Information Processing Systems

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee).


Reviews: Optimal Tagging with Markov Chain Optimization

Neural Information Processing Systems

Optimization of the link structure for PR is not a new topic. Apart from papers mentioned in Related work, there are also those not reviewed, including "PageRank Optimization by Edge Selection" by Csaji et al., "Maximizing PageRank with New Backlinks" by Olsen, "PageRank Optimization in Polynomial Time by Stochastic Shortest Path Reformulation" by Csaji et al. **The novelty** of the study is questionable. The probability of reaching the target state $\sigma$ can be viewed as the state's stationary probability for the graph, where the added edges are directed to the state $\sigma$ and the matrix of transition probabilities is raised to an appropriate power. This observation does not immediately reduce the problem of the paper to a known task, however, it may partially explain the similarity between the theoretical part and the works of Olsen, where the stationary probability is maximized. In particular, Section 4 resembles the work "Maximizing PageRank with New Backlinks" (not cited in the paper), where M. Olsen considered a reduction of a Markov chain optimization problem to the independent set problem, which is equivalent to the vertex cover problem. Theorems 5.1, 5.3 are reasonable, but very simple and resemble Lemmas 1,2 from [15].


A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

arXiv.org Artificial Intelligence

Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic nature of human co-speech gesture motion, and by the great diversity of communicative functions that gestures encompass. Gesture generation has seen surging interest recently, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep-learning-based generative models, that benefit from the growing availability of data. This review article summarizes co-speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule-based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text, and non-linguistic input. We also chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method. Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.


Planning with SiMBA: Motion Planning under Uncertainty for Temporal Goals using Simplified Belief Guides

arXiv.org Artificial Intelligence

This paper presents a new multi-layered algorithm for motion planning under motion and sensing uncertainties for Linear Temporal Logic specifications. We propose a technique to guide a sampling-based search tree in the combined task and belief space using trajectories from a simplified model of the system, to make the problem computationally tractable. Our method eliminates the need to construct fine and accurate finite abstractions. We prove correctness and probabilistic completeness of our algorithm, and illustrate the benefits of our approach on several case studies. Our results show that guidance with a simplified belief space model allows for significant speed-up in planning for complex specifications.


A Novel Point-based Algorithm for Multi-agent Control Using the Common Information Approach

arXiv.org Artificial Intelligence

The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems.


Call for Speakers for MLconf SF 2023

#artificialintelligence

MLconf gathers machine learning & AI enthusiasts from a broad range of industries and academic backgrounds to share new tools, tricks, platforms, algorithms and methods with a broad audience of practitioners. Each presentation offers an educational component to be shared with the community, in which specific algorithms and techniques can be shared and new applications of such are inspired. Today, we are making a call for presentations for our MLconf San Francisco conference to be held on October 19, 2023 at the Hotel Nikko in SF. The conference will feature presentations from across the machine learning landscape. If you, your team, organization, or colleague has done something innovative related to ML algorithms, Tools and Platforms, or Building and Managing Teams to solve hard problems, let us help you share your story. In your abstract, we encourage you to mention where you feel your techniques will transfer over into other Machine Learning applications, showing where it's relevant to the MLconf audience. Prior submissions have included presentations related to: Algorithms that have graduated from an academic/theory state and have proven to be effective, robust and scalable in production within industry application; Machine Learning/AI examples of specific challenges faced within current industry and how teams have found success by applying new algorithms and techniques or by applying modifications to existing practices for optimal outcomes; New platforms, tools for machine learning; New business practices for managing and growing data science teams; and Expanding machine learning to new domains. Abstracts should be 150-500 words in length and should illustrate the level of technicality in the proposed presentation. At the time of the event, presentations will be generally limited to 25-30 minutes in length in order to allow you to provide depth while also allowing for presentations from colleagues and Q&A. Emphasis should be given to the technical challenges, benchmarks, innovations and motivation for the development of models, algorithms and statistical models to analyze and draw inferences from patterns in data. Your presentation should definitely not be a product or sales pitch. ABSTRACT DEADLINE: June 30, 2023 Topics we are looking for include but are not limited to: AI/ML Ops Natural Language Processing Deep Learning Reinforcement Learning Data Science for Social Good Kernel Methods Causality Embeddings Recommendation Systems Quantum Computing and AI/ML Chemistry & AI/ML Pandemic Data & ML Model Interpretability Fraud Detection DeepFake Detection Generative Teaching Networks Facial Recognition/Biometric Identification Genetics & ML Experimental Reproducibility Best Practices Model Uncertainty and Data Drift Generative Adversarial Networks Transfer Learning Adversarial Machine Learning IoT and edge computing applications Genetic Algorithms Tensor Algebra Probabilistic Programming and Logic Machine Learning for Music and Art Bayesian Methods Markov Logic Networks Synthetic Art, Biology Ethics in Machine Learning Data / Algorithm Ethics Sketching Randomized Algorithms AI Education Game Theory Diversity in AI Community Detection Time Series Image Analysis Structured Learning using Neural Networks Healthcare & ML (Clinical Decision Support Systems, Record Keeping, Medical Imaging, etc.) FinTech & ML (Algorithmic Trading, Predictive Analytics, Fraud Detection & Prevention, Payments, etc.) In the spirit of sharing knowledge, presentation slides are shared with attendees and photographs and/or video footage of presentations are shared as well.


A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

arXiv.org Artificial Intelligence

We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided.


Interval Markov Decision Processes with Continuous Action-Spaces

arXiv.org Artificial Intelligence

Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.


Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

arXiv.org Artificial Intelligence

We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.