Goto

Collaborating Authors

 SPE


Specification Inference from Demonstrations

arXiv.org Artificial Intelligence

Learning from expert demonstrations has received a lot of attention in artificial intelligence and machine learning. The goal is to infer the underlying reward function that an agent is optimizing given a set of observations of the agent's behavior over time in a variety of circumstances, the system state trajectories, and a plant model specifying the evolution of the system state for different agent's actions. The system is often modeled as a Markov decision process, that is, the next state depends only on the current state and agent's action, and the the agent's choice of action depends only on the current state. While the former is a Markovian assumption on the evolution of system state, the later assumes that the target reward function is itself Markovian. In this work, we explore learning a class of non-Markovian reward functions, known in the formal methods literature as specifications. These specifications offer better composition, transferability, and interpretability. We then show that inferring the specification can be done efficiently without unrolling the transition system. We demonstrate on a 2-d grid world example.


Memory Augmented Control Networks

arXiv.org Artificial Intelligence

Planning problems in partially observable environments cannot be solved directly with convolutional networks and require some form of memory. But, even memory networks with sophisticated addressing schemes are unable to learn intelligent reasoning satisfactorily due to the complexity of simultaneously learning to access memory and plan. To mitigate these challenges we introduce the Memory Augmented Control Network (MACN). The proposed network architecture consists of three main parts. The first part uses convolutions to extract features and the second part uses a neural network-based planning module to pre-plan in the environment. The third part uses a network controller that learns to store those specific instances of past information that are necessary for planning. The performance of the network is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles. We show that our network learns to plan and can generalize to new environments.


Express delivery: use drones not trucks to cut carbon emissions, experts say

The Guardian - Business

Tue 13 Feb 2018 11.00 EST Last modified on Tue 13 Feb 2018 11.01 EST Drones invoke varying perceptions, from fun gadget to fly in the park to deadly military weapons. In the future, they may even be viewed as a handy tool in the battle to fight climate change. Greenhouse gas emissions from the tra...


Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner

arXiv.org Artificial Intelligence

During their first years of life, infants learn the language(s) of their environment at an amazing speed despite large cross cultural variations in amount and complexity of the available language input. Understanding this simple fact still escapes current cognitive and linguistic theories. Recently, spectacular progress in the engineering science, notably, machine learning and wearable technology, offer the promise of revolutionizing the study of cognitive development. Machine learning offers powerful learning algorithms that can achieve human-like performance on many linguistic tasks. Wearable sensors can capture vast amounts of data, which enable the reconstruction of the sensory experience of infants in their natural environment. The project of 'reverse engineering' language development, i.e., of building an effective system that mimics infant's achievements appears therefore to be within reach. Here, we analyze the conditions under which such a project can contribute to our scientific understanding of early language development. We argue that instead of defining a sub-problem or simplifying the data, computational models should address the full complexity of the learning situation, and take as input the raw sensory signals available to infants. This implies that (1) accessible but privacy-preserving repositories of home data be setup and widely shared, and (2) models be evaluated at different linguistic levels through a benchmark of psycholinguist tests that can be passed by machines and humans alike, (3) linguistically and psychologically plausible learning architectures be scaled up to real data using probabilistic/optimization principles from machine learning. We discuss the feasibility of this approach and present preliminary results.


Stop Being Rude To Amazon Alexa, Carol

Forbes Europe

Don't be rude to your home devices or there will be retribution. Google Home, Amazon Alexa, Apple Siri and Home Pod, Microsoft Cortana and any other voice operated smart device have something in common. It's not operating systems and it's certainly not the ecosystem in which each resides. What they...


Machine Learning And Business Problem-Solving

#artificialintelligence

For our lab, we began digging into the application of machine learning beginning in 2014, exploring its application in everything from supply chain optimization to factory automation and retail, including predicting terrorist attacks. Where we can apply knowledge for a given domain and weave it into a learning algorithm for the sake of doing non-deterministic pattern recognition, machine learning grounded in only statistics (not symbology, logic, or evolutionary) can readily improve upon guessing. Learning from a productive data set, and where overfitting is sufficiently avoided or mitigated, a learning algorithm can recognize patterns and generalize to cases not yet encountered. Such explorations for us started more than two years ago with SAP NS2 and ConvergentAI (formerly AxxonAI) where we find the project team's proof-of-concept (POC) results remain relevant today, but applicable to problem-solving the same way in other domains. While conceptually different, a strong relationship exists between machine learning and analytics where machine learning uses data and learning algorithms (supervised and unsupervised) to optimize a model based on performance and prior experience.


Isolating Sources of Disentanglement in Variational Autoencoders

arXiv.org Artificial Intelligence

We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate our $\beta$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the state-of-the-art $\beta$-VAE objective for learning disentangled representations, requiring no additional hyperparameters during training. We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG). We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the latent variables model is trained using our framework.


Probabilistic Warnings in National Security Crises: Pearl Harbor Revisited

arXiv.org Artificial Intelligence

Imagine a situation where a group of adversaries is preparing an attack on the United States or U.S. interests. An intelligence analyst has observed some signals, but the situation is rapidly changing. The analyst faces the decision to alert a principal decision maker that an attack is imminent, or to wait until more is known about the situation. This warning decision is based on the analyst's observation and evaluation of signals, independent or correlated, and on her updating of the prior probabilities of possible scenarios and their outcomes. The warning decision also depends on the analyst's assessment of the crisis' dynamics and perception of the preferences of the principal decision maker, as well as the lead time needed for an appropriate response. This article presents a model to support this analyst's dynamic warning decision. As with most problems involving warning, the key is to manage the tradeoffs between false positives and false negatives given the probabilities and the consequences of intelligence failures of both types. The model is illustrated by revisiting the case of the attack on Pearl Harbor in December 1941. It shows that the radio silence of the Japanese fleet carried considerable information (Sir Arthur Conan Doyle's "dog in the night" problem), which was misinterpreted at the time. Even though the probabilities of different attacks were relatively low, their consequences were such that the Bayesian dynamic reasoning described here may have provided valuable information to key decision makers.


Challenging Images For Minds and Machines

arXiv.org Artificial Intelligence

There is no denying the tremendous leap in the performance of machine learning methods in the past half-decade. Some might even say that specific sub-fields in pattern recognition, such as machine-vision, are as good as solved, reaching human and super-human levels. Arguably, lack of training data and computation power are all that stand between us and solving the remaining ones. In this position paper we underline cases in vision which are challenging to machines and even to human observers. This is to show limitations of contemporary models that are hard to ameliorate by following the current trend to increase training data, network capacity or computational power. Moreover, we claim that attempting to do so is in principle a suboptimal approach. We provide a taster of such examples in hope to encourage and challenge the machine learning community to develop new directions to solve the said difficulties.


Evolved Policy Gradients

arXiv.org Artificial Intelligence

We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning and eliminates the need for reward shaping at test time. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. Moreover, at test time, our learner optimizes only its learned loss function, and requires no explicit reward signal. In effect, the agent internalizes the reward structure, suggesting a direction toward agents that learn to solve new tasks simply from intrinsic motivation.