Goto

Collaborating Authors

 Undirected Networks


Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

arXiv.org Artificial Intelligence

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.


An Introduction to Machine Learning - Notes on New Technologies

#artificialintelligence

Humans learn from past experiences, Machines follow the instructions given by humans but, what if humans can train the machines to learn from the past experiences (data) and can do act much faster, here comes the concept of Machine Learning. Machine learning is the field of study that gives computers the capability to learn without being explicitly programmed. Machine learning algorithms build a mathematical model based on the data, known as training data, in order to make predictions or decisions. Machine learning is not only about learning, but also about understanding and reasoning. Machine Learning is not programmed, it is taught with data.


Logically Synthesized, Hardware-Accelerated, Restricted Boltzmann Machines for Combinatorial Optimization and Integer Factorization

arXiv.org Machine Learning

The Restricted Boltzmann Machine (RBM) is a stochastic neural network capable of solving a variety of difficult tasks such as NP-Hard combinatorial optimization problems and integer factorization. The RBM architecture is also very compact; requiring very few weights and biases. This, along with its simple, parallelizable sampling algorithm for finding the ground state of such problems, makes the RBM amenable to hardware acceleration. However, training of the RBM on these problems can pose a significant challenge, as the training algorithm tends to fail for large problem sizes and efficient mappings can be hard to find. Here, we propose a method of combining RBMs together that avoids the need to train large problems in their full form. We also propose methods for making the RBM more hardware amenable, allowing the algorithm to be efficiently mapped to an FPGA-based accelerator. Using this accelerator, we are able to show hardware accelerated factorization of 16 bit numbers with high accuracy with a speed improvement of 10000x and a power improvement of 32x.


Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality

arXiv.org Artificial Intelligence

Augmented-reality (AR) glasses that will have access to onboard sensors and an ability to display relevant information to the user present an opportunity to provide user assistance in quotidian tasks. Many such tasks can be characterized as object-rearrangement tasks. We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequence with the policy of an embodied agent and (2) presenting this sequence to the user as suggestions in the AR system's heads-up display. The embodied agent comprises a "hybrid" between the AR system and the user, with the AR system's observation space (i.e., sensors) and the user's action space (i.e., task-execution actions); its policy is learned by minimizing the task-completion time. In this initial study, we assume that the AR system's observations include the environment's map and localization of the objects and the user. These choices allow us to formalize the problem of computing AR assistance for any object-rearrangement task as a planning problem, specifically as a capacitated vehicle-routing problem. Further, we introduce a novel AR simulator that can enable web-based evaluation of AR-like assistance and associated at-scale data collection via the Habitat simulator for embodied artificial intelligence. Finally, we perform a study that evaluates user response to the proposed form of AR assistance on a specific quotidian object-rearrangement task, house cleaning, using our proposed AR simulator on mechanical turk. In particular, we study the effect of the proposed AR assistance on users' task performance and sense of agency over a range of task difficulties. Our results indicate that providing users with such assistance improves their overall performance and while users report a negative impact to their agency, they may still prefer the proposed assistance to having no assistance at all.


Facebook's Open Source Framework For Training Graph-Based ML Models

#artificialintelligence

In this case, GTN will be used in automatic differentiation of weighted finite-state transducers (WFSTs), which is an expressive and powerful graph. This framework enables the separation of graphs from operations on them that helps in exploring new structured loss functions and which in turn makes the encoding of prior knowledge on learning algorithms easier. Further, in a paper published by Awni Hannun, Vineel Pratap, Jacob Kahn & Wei-Ning Hsu of the Facebook AI Research, in this regard, proposed a convolutional WFST layer to be used in the interior of a deep neural network for mapping lower-level to higher-level representations. GTN is written in C and has bindings to Python. GTN can be used to express and design sequence-level loss functions.


Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

arXiv.org Machine Learning

Reinforcement learning (RL) [5] studies the problem of how to make sequential decisions to learn and act in unknown environments (which is usually modeled by a Markov Decision Process (MDP)) and maximize the collected rewards. There are mainly two types of algorithms to approach the RL problems: model-based algorithms and model-free algorithms. Model-based RL algorithms keep explicit description of the learned model and make decisions based on this model. In contrast, modelfree algorithms only maintain a group of value functions instead of the complete model of the system dynamics. Due to their space-and time-efficiency, model-free RL algorithms have been getting popular in a wide range of practical tasks (e.g., DQN [16], TRPO [18], and A3C [15]). In RL theory, model-free algorithms are explicitly defined to be the ones whose space complexity is always sublinear relative to the space required to store the MDP parameters [12]. For tabular MDPs (i.e., MDPs with finite number of states and actions, usually denoted by S and A respectively), this requires that the space complexity to be opS


Reinforcement Learning on Computational Resource Allocation of Cloud-based Wireless Networks

arXiv.org Artificial Intelligence

Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and management. In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes. The stochastic features of the Central Processing Unit (CPU) load variation as well as the possible complex parallelisation situations of the cloud processes makes the dynamic resource allocation an interesting research challenge. This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcement-learning agent to optimise the dynamic resource allocation of the CPU usage. Value iteration method is used for the reinforcement-learning agent to pick up the optimal policy during the MDP. To evaluate our performance we analyse two types of processes that can be used in the cloud-based IoT networks with different levels of parallelisation capabilities, i.e., Software-Defined Radio (SDR) and Software-Defined Networking (SDN). The results show that our agent rapidly converges to the optimal policy, stably performs in different parameter settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.


Deep Imitation Learning for Bimanual Robotic Manipulation

arXiv.org Artificial Intelligence

We present a deep imitation learning framework for robotic bimanual manipulation in a continuous state-action space. Imitation learning has been effectively utilized in mimicking bimanual manipulation movements, but generalizing the movement to objects in different locations has not been explored. We hypothesize that to precisely generalize the learned behavior relative to an object's location requires modeling relational information in the environment. To achieve this, we designed a method that (i) uses a multi-model framework to decomposes complex dynamics into elemental movement primitives, and (ii) parameterizes each primitive using a recurrent graph neural network to capture interactions. Our model is a deep, hierarchical, modular architecture with a high-level planner that learns to compose primitives sequentially and a low-level controller which integrates primitive dynamics modules and inverse kinematics control. We demonstrate the effectiveness using several simulated bimanual robotic manipulation tasks. Compared to models based on previous imitation learning studies, our model generalizes better and achieves higher success rates in the simulated tasks.


Unsupervised Joint $k$-node Graph Representations with Compositional Energy-Based Models

arXiv.org Artificial Intelligence

Existing Graph Neural Network (GNN) methods that learn inductive unsupervised graph representations focus on learning node and edge representations by predicting observed edges in the graph. Although such approaches have shown advances in downstream node classification tasks, they are ineffective in jointly representing larger $k$-node sets, $k{>}2$. We propose MHM-GNN, an inductive unsupervised graph representation approach that combines joint $k$-node representations with energy-based models (hypergraph Markov networks) and GNNs. To address the intractability of the loss that arises from this combination, we endow our optimization with a loss upper bound using a finite-sample unbiased Markov Chain Monte Carlo estimator. Our experiments show that the unsupervised MHM-GNN representations of MHM-GNN produce better unsupervised representations than existing approaches from the literature.


Joint Turn and Dialogue level User Satisfaction Estimation on Multi-Domain Conversations

arXiv.org Artificial Intelligence

Dialogue level quality estimation is vital for optimizing data driven dialogue management. Current automated methods to estimate turn and dialogue level user satisfaction employ hand-crafted features and rely on complex annotation schemes, which reduce the generalizability of the trained models. We propose a novel user satisfaction estimation approach which minimizes an adaptive multi-task loss function in order to jointly predict turn-level Response Quality labels provided by experts and explicit dialogue-level ratings provided by end users. The proposed BiLSTM based deep neural net model automatically weighs each turn's contribution towards the estimated dialogue-level rating, implicitly encodes temporal dependencies, and removes the need to hand-craft features. On dialogues sampled from 28 Alexa domains, two dialogue systems and three user groups, the joint dialogue-level satisfaction estimation model achieved up to an absolute 27% (0.43->0.70) and 7% (0.63->0.70) improvement in linear correlation performance over baseline deep neural net and benchmark Gradient boosting regression models, respectively.