AITopics

1903.08772

Country: North America > United States (0.45)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

arXiv.org Machine LearningMar-18-2019

Variance reduction for MCMC methods via martingale representations

Belomestny, D., Moulines, E., Shagadatov, N., Urusov, M.

In this paper we propose an efficient variance reduction approach for MCMC algorithms relying on a novel discrete time martingale representation for Markov chains. Our approach is fully non-asymptotic and does not require any type of ergodicity or special product structure of the underlying density. By rigorously analyzing the convergence of the proposed algorithm, we show that it's complexity is indeed significantly smaller than one of the original MCMC algorithm. The numerical performance of the new method is illustrated in the case of Gaussian mixtures and binary regression.

artificial intelligence, machine learning, variance reduction, (15 more...)

1903.07373

Country:

North America > United States > New York (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Cordonnier, Jean-Baptiste, Loukas, Andreas

Extrapolating paths with graph neural networks

arXiv.org Machine LearningMar-18-2019

We consider the problem of path inference: given a path prefix, i.e., a partially observed sequence of nodes in a graph, we want to predict which nodes are in the missing suffix. In particular, we focus on natural paths occurring as a by-product of the interaction of an agent with a network---a driver on the transportation network, an information seeker in Wikipedia, or a client in an online shop. Our interest is sparked by the realization that, in contrast to shortest-path problems, natural paths are usually not optimal in any graph-theoretic sense, but might still follow predictable patterns. Our main contribution is a graph neural network called Gretel. Conditioned on a path prefix, this network can efficiently extrapolate path suffixes, evaluate path likelihood, and sample from the future path distribution. Our experiments with GPS traces on a road network and user-navigation paths in Wikipedia confirm that Gretel is able to adapt to graphs with very different properties, while also comparing favorably to previous solutions.

graph, gretel, node, (16 more...)

1903.07518

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Russia (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (0.69)
Transportation > Ground > Rail (0.46)
Transportation > Ground > Road (0.35)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zheng, Jiaxiao, de Veciana, Gustavo

Modeling and Optimization of Human-machine Interaction Processes via the Maximum Entropy Principle

arXiv.org Artificial IntelligenceMar-17-2019

We propose a data-driven framework to enable the modeling and optimization of human-machine interaction processes, e.g., systems aimed at assisting humans in decision-making or learning, work-load allocation, and interactive advertising. This is a challenging problem for several reasons. First, humans' behavior is hard to model or infer, as it may reflect biases, long term memory, and sensitivity to sequencing, i.e., transience and exponential complexity in the length of the interaction. Second, due to the interactive nature of such processes, the machine policy used to engage with a human may bias possible data-driven inferences. Finally, in choosing machine policies that optimize interaction rewards, one must, on the one hand, avoid being overly sensitive to error/variability in the estimated human model, and on the other, being overly deterministic/predictable which may result in poor human 'engagement' in the interaction. To meet these challenges, we propose a robust approach, based on the maximum entropy principle, which iteratively estimates human behavior and optimizes the machine policy--Alternating Entropy-Reward Ascent (AREA) algorithm. We characterize AREA, in terms of its space and time complexity and convergence. We also provide an initial validation based on synthetic data generated by an established noisy nonlinear model for human decision-making.

artificial intelligence, machine learning, machine policy, (18 more...)

1903.07157

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Al-Aradi, Ali, Jaimungal, Sebastian

Active and Passive Portfolio Management with Latent Factors

arXiv.org Machine LearningMar-16-2019

We address a portfolio selection problem that combines active (outperformance) and passive (tracking) objectives using techniques from convex analysis. We assume a general semimartingale market model where the assets' growth rate processes are driven by a latent factor. Using techniques from convex analysis we obtain a closed-form solution for the optimal portfolio and provide a theorem establishing its uniqueness. The motivation for incorporating latent factors is to achieve improved growth rate estimation, an otherwise notoriously difficult task. To this end, we focus on a model where growth rates are driven by an unobservable Markov chain. The solution in this case requires a filtering step to obtain posterior probabilities for the state of the Markov chain from asset price information, which are subsequently used to find the optimal allocation. We show the optimal strategy is the posterior average of the optimal strategies the investor would have held in each state assuming the Markov chain remains in that state. Finally, we implement a number of historical backtests to demonstrate the performance of the optimal portfolio.

artificial intelligence, machine learning, portfolio, (18 more...)

1903.06928

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Mason, Karl, Grijalva, Santiago

A Review of Reinforcement Learning for Autonomous Building Energy Management

arXiv.org Machine LearningMar-15-2019

The area of building energy management has received a significant amount of interest in recent years. This area is concerned with combining advancements in sensor technologies, communications and advanced control algorithms to optimize energy utilization. Reinforcement learning is one of the most prominent machine learning algorithms used for control problems and has had many successful applications in the area of building energy management. This research gives a comprehensive review of the literature relating to the application of reinforcement learning to developing autonomous building energy management systems. The main direction for future research and challenges in reinforcement learning are also outlined.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1903.05196

Country:

North America > United States (0.68)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.34)

Industry:

Transportation > Ground > Road (1.00)
Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Kamanchi, Chandramouli, Diddigi, Raghuram Bharadwaj, Bhatnagar, Shalabh

Successive Over Relaxation Q-Learning

arXiv.org Machine LearningMar-15-2019

In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In [1], a successive over-relaxation based value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the standard Q-Learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1903.03812

Country: Asia > India (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Modi, Aditya, Tewari, Ambuj

Contextual Markov Decision Processes using Generalized Linear Models

arXiv.org Artificial IntelligenceMar-14-2019

We consider the recently proposed reinforcement learning (RL) framework of Contextual Markov Decision Processes (CMDP), where the agent has a sequence of episodic interactions with tabular environments chosen from a possibly infinite set. The parameters of these environments depend on a context vector that is available to the agent at the start of each episode. In this paper, we propose a no-regret online RL algorithm in the setting where the MDP parameters are obtained from the context using generalized linear models (GLMs). The proposed algorithm \texttt{GL-ORL} relies on efficient online updates and is also memory efficient. Our analysis of the algorithm gives new results in the logit link case and improves previous bounds in the linear case. Our algorithm uses efficient Online Newton Step updates to build confidence sets. Moreover, for any strongly convex link function, we also show a generic conversion from any online no-regret algorithm to confidence sets.

algorithm, probability, sequence, (12 more...)

1903.06187

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Serafini, Luciano, Traverso, Paolo

Incremental Learning of Discrete Planning Domains from Continuous Perceptions

arXiv.org Artificial IntelligenceMar-14-2019

We propose a framework for learning discrete deterministic planning domains. In this framework, an agent learns the domain by observing the action effects through continuous features that describe the state of the environment after the execution of each action. Besides, the agent learns its perception function, i.e., a probabilistic mapping between state variables and sensor data represented as a vector of continuous random variables called perception variables. We define an algorithm that updates the planning domain and the perception function by (i) introducing new states, either by extending the possible values of state variables, or by weakening their constraints; (ii) adapts the perception function to fit the observed data (iii) adapts the transition function on the basis of the executed actions and the effects observed via the perception function. The framework is able to deal with exogenous events that happen in the environment.

locprq, perception function, planning domain, (17 more...)

1903.05937

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zito, Claudio, Ortenzi, Valerio, Adjigble, Maxime, Kopicki, Marek, Stolkin, Rustam, Wyatt, Jeremy L.

Hypothesis-based Belief Planning for Dexterous Grasping

arXiv.org Artificial IntelligenceMar-13-2019

Noname manuscript No. (will be inserted by the editor) Abstract Belief space planning is a viable alternative to formalise partially observable control problems and, in the recent years, its application to robot manipulation problems has grown. However, this planning approach was tried successfully only on simplified control problems. In this paper, we apply belief space planning to the problem of planning dexterous reach-tograsp trajectories under object pose uncertainty. In our framework, the robot perceives the object to be grasped on-the-fly as a point cloud and compute a full 6D, non-Gaussian distribution over the object's pose (our belief space). The system has no limitations on the geometry of the object, i.e., non-convex objects can be represented, nor assumes that the point cloud is a complete Figure 1: Boris: half-humanoid robot platform developed representation of the object. A plan in the belief space at the University of Birmingham. is then created to reach and grasp the object, such that the information value of expected contacts along the trajectory is maximised to compensate for the pose uncertainty. 1 Introduction If an unexpected contact occurs when performing the action, such information is used to refine Imagine that you are reaching into the fridge to grasp the pose distribution and triggers a re-planning. Experimental an object you can only partially see. Rather than relying results show that our planner (IR3ne) improves solely on vision, you must use touch in order to grasp reliability and compensates for the pose uncertainty localise it and securely grasp it.

artificial intelligence, machine learning, trajectory, (16 more...)

1903.05517

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > Iowa (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)