AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Successive Over Relaxation Q-Learning

Kamanchi, Chandramouli, Diddigi, Raghuram Bharadwaj, Bhatnagar, Shalabh

arXiv.org Machine LearningMar-15-2019

In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In [1], a successive over-relaxation based value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the standard Q-Learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1903.03812

Country: Asia > India (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Contextual Markov Decision Processes using Generalized Linear Models

Modi, Aditya, Tewari, Ambuj

arXiv.org Artificial IntelligenceMar-14-2019

We consider the recently proposed reinforcement learning (RL) framework of Contextual Markov Decision Processes (CMDP), where the agent has a sequence of episodic interactions with tabular environments chosen from a possibly infinite set. The parameters of these environments depend on a context vector that is available to the agent at the start of each episode. In this paper, we propose a no-regret online RL algorithm in the setting where the MDP parameters are obtained from the context using generalized linear models (GLMs). The proposed algorithm \texttt{GL-ORL} relies on efficient online updates and is also memory efficient. Our analysis of the algorithm gives new results in the logit link case and improves previous bounds in the linear case. Our algorithm uses efficient Online Newton Step updates to build confidence sets. Moreover, for any strongly convex link function, we also show a generic conversion from any online no-regret algorithm to confidence sets.

algorithm, probability, sequence, (12 more...)

arXiv.org Artificial Intelligence

1903.06187

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Incremental Learning of Discrete Planning Domains from Continuous Perceptions

Serafini, Luciano, Traverso, Paolo

arXiv.org Artificial IntelligenceMar-14-2019

We propose a framework for learning discrete deterministic planning domains. In this framework, an agent learns the domain by observing the action effects through continuous features that describe the state of the environment after the execution of each action. Besides, the agent learns its perception function, i.e., a probabilistic mapping between state variables and sensor data represented as a vector of continuous random variables called perception variables. We define an algorithm that updates the planning domain and the perception function by (i) introducing new states, either by extending the possible values of state variables, or by weakening their constraints; (ii) adapts the perception function to fit the observed data (iii) adapts the transition function on the basis of the executed actions and the effects observed via the perception function. The framework is able to deal with exogenous events that happen in the environment.

locprq, perception function, planning domain, (17 more...)

arXiv.org Artificial Intelligence

1903.05937

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Hypothesis-based Belief Planning for Dexterous Grasping

Zito, Claudio, Ortenzi, Valerio, Adjigble, Maxime, Kopicki, Marek, Stolkin, Rustam, Wyatt, Jeremy L.

arXiv.org Artificial IntelligenceMar-13-2019

Noname manuscript No. (will be inserted by the editor) Abstract Belief space planning is a viable alternative to formalise partially observable control problems and, in the recent years, its application to robot manipulation problems has grown. However, this planning approach was tried successfully only on simplified control problems. In this paper, we apply belief space planning to the problem of planning dexterous reach-tograsp trajectories under object pose uncertainty. In our framework, the robot perceives the object to be grasped on-the-fly as a point cloud and compute a full 6D, non-Gaussian distribution over the object's pose (our belief space). The system has no limitations on the geometry of the object, i.e., non-convex objects can be represented, nor assumes that the point cloud is a complete Figure 1: Boris: half-humanoid robot platform developed representation of the object. A plan in the belief space at the University of Birmingham. is then created to reach and grasp the object, such that the information value of expected contacts along the trajectory is maximised to compensate for the pose uncertainty. 1 Introduction If an unexpected contact occurs when performing the action, such information is used to refine Imagine that you are reaching into the fridge to grasp the pose distribution and triggers a re-planning. Experimental an object you can only partially see. Rather than relying results show that our planner (IR3ne) improves solely on vision, you must use touch in order to grasp reliability and compensates for the pose uncertainty localise it and securely grasp it.

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

1903.05517

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > Iowa (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Online Human Activity Recognition Employing Hierarchical Hidden Markov Models

Asghari, Parviz, Soelimani, Elnaz, Nazerfard, Ehsan

arXiv.org Machine LearningMar-12-2019

In the last few years there has been a growing interest in Human Activity Recognition~(HAR) topic. Sensor-based HAR approaches, in particular, has been gaining more popularity owing to their privacy preserving nature. Furthermore, due to the widespread accessibility of the internet, a broad range of streaming-based applications such as online HAR, has emerged over the past decades. However, proposing sufficiently robust online activity recognition approach in smart environment setting is still considered as a remarkable challenge. This paper presents a novel online application of Hierarchical Hidden Markov Model in order to detect the current activity on the live streaming of sensor events. Our method consists of two phases. In the first phase, data stream is segmented based on the beginning and ending of the activity patterns. Also, on-going activity is reported with every receiving observation. This phase is implemented using Hierarchical Hidden Markov models. The second phase is devoted to the correction of the provided label for the segmented data stream based on statistical features. The proposed model can also discover the activities that happen during another activity - so-called interrupted activities. After detecting the activity pane, the predicted label will be corrected utilizing statistical features such as time of day at which the activity happened and the duration of the activity. We validated our proposed method by testing it against two different smart home datasets and demonstrated its effectiveness, which is competing with the state-of-the-art methods.

artificial intelligence, machine learning, recognition, (15 more...)

arXiv.org Machine Learning

1903.0482

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On the Pitfalls of Measuring Emergent Communication

Lowe, Ryan, Foerster, Jakob, Boureau, Y-Lan, Pineau, Joelle, Dauphin, Yann

arXiv.org Artificial IntelligenceMar-12-2019

How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent's learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents' behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1903.05168

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Africa > Ghana > Greater Accra > Accra (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Markov Networks: Undirected Graphical Models

#artificialintelligenceMar-11-2019, 01:02:04 GMT

This article briefs you about Markov Networks which falls under the family of Undirected Graphical Models (UGM). This article is a follow-up to Bayesian Network, which is a type of Directed Graphical Models. Key Motivation behind these networks is to parameterize the Joint Probability Distribution based on Local Independencies between Random Variables. Generally, Bayesian Network requires to pre-define a directionality to assert an influence of random variable. But there might be cases where interaction between nodes ( or random variables) are symmetric in nature, and we would like to have a model which can represent this symmetricity without directional influence.

bayesian network, markov network, undirected graphical model, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.83)

Add feedback

Financial Trading Model with Stock Bar Chart Image Time Series with Deep Convolutional Neural Networks

Sezer, Omer Berat, Ozbayoglu, Ahmet Murat

arXiv.org Machine LearningMar-11-2019

Even though computational intelligence techniques have been extensively utilized in financial trading systems, almost all developed models use the time series data for price prediction or identifying buy-sell points. However, in this study we decided to use 2-D stock bar chart images directly without introducing any additional time series associated with the underlying stock. We propose a novel algorithmic trading model CNN-BI (Convolutional Neural Network with Bar Images) using a 2-D Convolutional Neural Network. We generated 2-D images of sliding windows of 30-day bar charts for Dow 30 stocks and trained a deep Convolutional Neural Network (CNN) model for our algorithmic trading model. We tested our model separately between 2007-2012 and 2012-2017 for representing different market conditions. The results indicate that the model was able to outperform Buy and Hold strategy, especially in trendless or bear markets. Since this is a preliminary study and probably one of the first attempts using such an unconventional approach, there is always potential for improvement. Overall, the results are promising and the model might be integrated as part of an ensemble trading model combined with different strategies.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

1903.0461

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Rethinking System Health Management

Balaban, Edward, Johnson, Stephen B., Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceMar-10-2019

Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper proposes that, rather than being treated as connected, yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and computing, we argue that the unified approach will increase a system's operational effectiveness and may also lead to a lower overall system complexity. We overview the prevalent system health management methodology and illustrate its limitations through numerical examples. We then describe the proposed unification approach and show how it accommodates the typical system health management concepts.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1903.03948

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

The Promise of Hierarchical Reinforcement Learning

#artificialintelligenceMar-9-2019, 23:58:49 GMT

This top-down planning approach decides what a good subgoal is before planning to achieve it." "For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is mis- specified whenever, the representation cannot express any policy with acceptable performance.

machine learning, reinforcement, reinforcement learning, (16 more...)

#artificialintelligence

Country: Europe > United Kingdom > England (0.28)

Genre:

Research Report (0.68)
Overview (0.46)

Industry:

Education (0.94)
Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback