AITopics

2112.14657

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.15)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceDec-28-2021

Learning Based Task Offloading in Digital Twin Empowered Internet of Vehicles

Zheng, Jinkai, Luan, Tom H., Gao, Longxiang, Zhang, Yao, Wu, Yuan

Mobile edge computing has become an effective and fundamental paradigm for futuristic autonomous vehicles to offload computing tasks. However, due to the high mobility of vehicles, the dynamics of the wireless conditions, and the uncertainty of the arrival computing tasks, it is difficult for a single vehicle to determine the optimal offloading strategy. In this paper, we propose a Digital Twin (DT) empowered task offloading framework for Internet of Vehicles. As a software agent residing in the cloud, a DT can obtain both global network information by using communications among DTs, and historical information of a vehicle by using the communications within the twin. The global network information and historical vehicular information can significantly facilitate the offloading. In specific, to preserve the precious computing resource at different levels for most appropriate computing tasks, we integrate a learning scheme based on the prediction of futuristic computing tasks in DT. Accordingly, we model the offloading scheduling process as a Markov Decision Process (MDP) to minimize the long-term cost in terms of a trade off between task latency, energy consumption, and renting cost of clouds. Simulation results demonstrate that our algorithm can effectively find the optimal offloading strategy, as well as achieve the fast convergence speed and high performance, compared with other existing approaches.

algorithm, computing resource, vehicle, (14 more...)

2201.09076

Country:

North America > United States (0.14)
Asia > China (0.04)
Asia > Macao (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

arXiv.org Machine LearningDec-28-2021

Exponential Family Model-Based Reinforcement Learning via Score Matching

Li, Gene, Li, Junbo, Srebro, Nathan, Wang, Zhaoran, Yang, Zhuoran

This paper studies the regret minimization problem for finite horizon, episodic reinforcement learning (RL) with infinitely large state and action spaces. Empirically, RL has achieved success in diverse domains, even when the problem size (measured in the number of states and actions) explodes [35, 44, 28]. The key to developing sample-efficient algorithms is to leverage function approximation, enabling us to generalize across different state-action pairs. Much theoretical progress has been made towards understanding function approximation in RL. Existing theory typically requires strong linearity assumptions on transition dynamics [e.g., 55, 26, 10, 36] or action-value functions [e.g., 30, 57] of the Markov Decision Process (MDP). However, most real world problems are nonlinear, and our theoretical understanding of these settings remains limited. Thus, we ask the question: Can we design provably efficient RL algorithms in nonlinear environments? Recently, Chowdhury et al. [13] introduced a nonlinear setting where the state-transition measures are finitely parameterized exponential family models, and they proposed to estimate model parameters via maximum likelihood estimation (MLE). The exponential family is a well-studied and powerful statistical framework, so it is a natural model class to consider beyond linear models.

algorithm, estimator, vec, (15 more...)

2112.14195

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Kamoutsi, Angeliki, Banjac, Goran, Lygeros, John

Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

arXiv.org Artificial IntelligenceDec-28-2021

We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number of states. We further present an equivalent no-regret online-learning interpretation.

efficient performance bound, learning, primal-dual reinforcement learning, (13 more...)

2112.14004

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Foster, Dylan J., Kakade, Sham M., Qian, Jian, Rakhlin, Alexander

The Statistical Complexity of Interactive Decision Making

arXiv.org Machine LearningDec-26-2021

A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. In particular, we provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions (E2D), which transforms any algorithm for supervised estimation into an online algorithm for decision making. E2D attains a regret bound matching our lower bound, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results constitute a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds. More broadly, the approach can be viewed as a decision-theoretic analogue of the classical Le Cam theory of statistical estimation; it also unifies a number of existing approaches -- both Bayesian and frequentist.

algorithm, decision-estimation coefficient, theorem 3, (15 more...)

2112.13487

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

arXiv.org Artificial IntelligenceDec-26-2021

Abstractions of General Reinforcement Learning

Majeed, Sultan J.

The field of artificial intelligence (AI) is devoted to the creation of artificial decision-makers that can perform (at least) on par with the human counterparts on a domain of interest. Unlike the agents in traditional AI, the agents in artificial general intelligence (AGI) are required to replicate human intelligence in almost every domain of interest. Moreover, an AGI agent should be able to achieve this without (virtually any) further changes, retraining, or fine-tuning of the parameters. The real world is non-stationary, non-ergodic, and non-Markovian: we, humans, can neither revisit our past nor are the most recent observations sufficient statistics. Yet, we excel at a variety of complex tasks. Many of these tasks require longterm planning. We can associate this success to our natural faculty to abstract away task-irrelevant information from our overwhelming sensory experience. We make task-specific mental models of the world without much effort. Due to this ability to abstract, we can plan on a significantly compact representation of a task without much loss of performance. Not only this, we also abstract our actions to produce high-level plans: the level of action-abstraction can be anywhere between small muscle movements to a mental notion of "doing an action". It is natural to assume that any AGI agent competing with humans (at every plausible domain) should also have these abilities to abstract its experiences and actions. This thesis is an inquiry into the existence of such abstractions which aid efficient planing for a wide range of domains, and most importantly, these abstractions come with some optimality guarantees.

algorithm, homomorphism, optimal policy, (15 more...)

2112.13404

Country:

North America > United States (0.13)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Summary/Review (0.92)
Research Report > New Finding (0.45)

Industry:

Health & Medicine (0.92)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
(3 more...)

Majeed, Sultan J., Hutter, Marcus

Reducing Planning Complexity of General Reinforcement Learning with Non-Markovian Abstractions

arXiv.org Artificial IntelligenceDec-26-2021

The field of General Reinforcement Learning (GRL) formulates the problem of sequential decision-making from ground up. The history of interaction constitutes a "ground" state of the system, which never repeats. On the one hand, this generality allows GRL to model almost every domain possible, e.g.\ Bandits, MDPs, POMDPs, PSRs, and history-based environments. On the other hand, in general, the near-optimal policies in GRL are functions of complete history, which hinders not only learning but also planning in GRL. The usual way around for the planning part is that the agent is given a Markovian abstraction of the underlying process. So, it can use any MDP planning algorithm to find a near-optimal policy. The Extreme State Aggregation (ESA) framework has extended this idea to non-Markovian abstractions without compromising on the possibility of planning through a (surrogate) MDP. A distinguishing feature of ESA is that it proves an upper bound of $O\left(\varepsilon^{-A} \cdot (1-\gamma)^{-2A}\right)$ on the number of states required for the surrogate MDP (where $A$ is the number of actions, $\gamma$ is the discount-factor, and $\varepsilon$ is the optimality-gap) which holds \emph{uniformly} for \emph{all} domains. While the possibility of a universal bound is quite remarkable, we show that this bound is very loose. We propose a novel non-MDP abstraction which allows for a much better upper bound of $O\left(\varepsilon^{-1} \cdot (1-\gamma)^{-2} \cdot A \cdot 2^{A}\right)$. Furthermore, we show that this bound can be improved further to $O\left(\varepsilon^{-1} \cdot (1-\gamma)^{-2} \cdot \log^3 A \right)$ by using an action-sequentialization method.

abstraction, reinforcement learning, surrogate mdp, (13 more...)

2112.13386

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Karr, Alan F., Hauzel, Jason, Porter, Adam A., Schaefer, Marcel

Application of Markov Structure of Genomes to Outlier Identification and Read Classification

arXiv.org Machine LearningDec-24-2021

That the sequential structure of genomes is important has been known since the discovery of DNA. In this paper we employ a statistics and stochastic process perspective on triplets of successive bases to address two important applications: identifying outliers in genome databases, and classifying reads in the metagenomic context of reference-guided assembly. From this stochastic process perspective, triplets are a second-order Markov chain specified by the distribution of each base conditional on its two immediate predecessors. To be sure, studying genomes via base sequence distributions is not novel. Previous papers have addressed genome signatures (Karlin et al., 1997; Campbell et al., 1999; Takashi et al., 2003), as well as frequentist (Rosen et al., 2008) and Bayesian (Wang et al., 2007) approaches to classification problems.

coronavirus genome, genome, probability, (12 more...)

2112.13117

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Mou, Wenlong, Pananjady, Ashwin, Wainwright, Martin J., Bartlett, Peter L.

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

arXiv.org Machine LearningDec-23-2021

We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).

equation, inequality, markov chain, (17 more...)

2112.1277

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Molloy, Timothy L., Nair, Girish N.

Entropy-Regularized Partially Observed Markov Decision Processes

arXiv.org Artificial IntelligenceDec-22-2021

We investigate partially observed Markov decision processes (POMDPs) with cost functions regularized by entropy terms describing state, observation, and control uncertainty. Standard POMDP techniques are shown to offer bounded-error solutions to these entropy-regularized POMDPs, with exact solutions when the regularization involves the joint entropy of the state, observation, and control trajectories. Our joint-entropy result is particularly surprising since it constitutes a novel, tractable formulation of active state estimation. Partially observed Markov decision processes (POMDPs) and Markov decision processes (MDPs) with information-theoretic costs have attracted widespread attention across the technical disciplines of systems and control [2]-[5], computer science [6]-[8], signal processing [9]-[12], and robotics [13]-[15]. Interest in such POMDPs has been driven, in large part, by active state estimation problems in which informationtheoretic costs describing the uncertainty about latent states are minimized in order to aid or enhance the performance of state estimation algorithms [5], [6], [9], [10].

entropy, input-output entropy, pomdp, (13 more...)

2112.12255

Country:

Oceania > Australia (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)