Goto

Collaborating Authors

 Markov Models


Modelling Bahdanau Attention using Election methods aided by Q-Learning

arXiv.org Machine Learning

Neural Machine Translation has lately gained a lot of "attention" with the advent of more and more sophisticated but drastically improved models. Attention mechanism has proved to be a boon in this direction by providing weights to the input words, making it easy for the decoder to identify words representing the present context. But by and by, as newer attention models with more complexity came into development, they involved large computation, making inference slow. In this paper, we have modelled the attention network using techniques resonating with social choice theory. Along with that, the attention mechanism, being a Markov Decision Process, has been represented by reinforcement learning techniques. Thus, we propose to use an election method ( k -Borda), fine-tuned using Q-learning, as a replacement for attention networks. The inference time for this network is less than a standard Bahdanau translator, and the results of the translation are comparable. This not only experimentally verifies the claims stated above but also helped provide a faster inference.


Investigation of Error Simulation Techniques for Learning Dialog Policies for Conversational Error Recovery

arXiv.org Artificial Intelligence

Training dialog policies for speech-based virtual assistants requires a plethora of conversational data. The data collection phase is often expensive and time consuming due to human involvement. To address this issue, a common solution is to build user simulators for data generation. For the successful deployment of the trained policies into real world domains, it is vital that the user simulator mimics realistic conditions. In particular, speech-based assistants are heavily affected by automatic speech recognition and language understanding errors, hence the user simulator should be able to simulate similar errors. In this paper, we review the existing error simulation methods that induce errors at audio, phoneme, text, or semantic level; and conduct detailed comparisons between the audio-level and text-level methods. In the process, we improve the existing text-level method by introducing confidence score prediction and out-of-vocabulary word mapping. We also explore the impact of audio-level and text-level methods on learning a simple clarification dialog policy to recover from errors to provide insight on future improvement for both approaches.


Sparsity through evolutionary pruning prevents neuronal networks from overfitting

arXiv.org Artificial Intelligence

Modern Machine learning techniques take advantage of the exponentially rising calculation power in new generation processor units. Thus, the number of parameters which are trained to resolve complex tasks was highly increased over the last decades. However, still the networks fail - in contrast to our brain - to develop general intelligence in the sense of being able to solve several complex tasks with only one network architecture. This could be the case because the brain is not a randomly initialized neural network, which has to be trained by simply investing a lot of calculation power, but has from birth some fixed hierarchical structure. To make progress in decoding the structural basis of biological neural networks we here chose a bottom-up approach, where we evolutionarily trained small neural networks in performing a maze task. This simple maze task requires dynamical decision making with delayed rewards. We were able to show that during the evolutionary optimization random severance of connections lead to better generalization performance of the networks compared to fully connected networks. We conclude that sparsity is a central property of neural networks and should be considered for modern Machine learning approaches.


Feedback-Based Self-Learning in Large-Scale Conversational AI Agents

arXiv.org Artificial Intelligence

Today, most large-scale conversational AI agents (e.g. Alexa, Siri, or Google Assistant) are built using manually annotated data to train the different components of the system. Typically, the accuracy of the ML models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to improve the accuracy of these components becomes prohibitively costly and time consuming. In this paper, we propose a system that leverages user-system interaction feedback signals to automate learning without any manual annotation. Users here tend to modify a previous query in hopes of fixing an error in the previous turn to get the right results. These reformulations, which are often preceded by defective experiences caused by errors in ASR, NLU, ER or the application. In some cases, users may not properly formulate their requests (e.g. providing partial title of a song), but gleaning across a wider pool of users and sessions reveals the underlying recurrent patterns. Our proposed self-learning system automatically detects the errors, generate reformulations and deploys fixes to the runtime system to correct different types of errors occurring in different components of the system. In particular, we propose leveraging an absorbing Markov Chain model as a collaborative filtering mechanism in a novel attempt to mine these patterns. We show that our approach is highly scalable, and able to learn reformulations that reduce Alexa-user errors by pooling anonymized data across millions of customers. The proposed self-learning system achieves a win/loss ratio of 11.8 and effectively reduces the defect rate by more than 30% on utterance level reformulations in our production A/B tests. To the best of our knowledge, this is the first self-learning large-scale conversational AI system in production.


Integrating Markov processes with structural causal modeling enables counterfactual inference in complex systems

arXiv.org Machine Learning

This manuscript contributes a general and practical framework for casting a Markov process model of a system at equilibrium as a structural causal model, and carrying out counterfactual inference. Markov processes mathematically describe the mechanisms in the system, and predict the system's equilibrium behavior upon intervention, but do not support counterfactual inference. In contrast, structural causal models support counterfactual inference, but do not identify the mechanisms. This manuscript leverages the benefits of both approaches. We define the structural causal models in terms of the parameters and the equilibrium dynamics of the Markov process models, and counterfactual inference flows from these settings. The proposed approach alleviates the identifiability drawback of the structural causal models, in that the counterfactual inference is consistent with the counterfactual trajectories simulated from the Markov process model. We showcase the benefits of this framework in case studies of complex biomolecular systems with nonlinear dynamics. We illustrate that, in presence of Markov process model misspecification, counterfactual inference leverages prior data, and therefore estimates the outcome of an intervention more accurately than a direct simulation.


How Smart Are AI Chips, Really?

#artificialintelligence

The best part about the term "Artificial Intelligence" is that nobody can really tell you what it exactly means. The main reason for this stems from the term "intelligence", with definitions ranging from the ability to practice logical reasoning to the ability to perform cognitive tasks or dream up symphonies. When it comes to human intelligence, properties such as self-awareness, complex cognitive feats, and the ability to plan and motivate oneself are generally considered to be defining features. But frankly, what is and isn't "intelligence" is open to debate. What isn't open to debate is that AI is a marketing goldmine.


How Smart Are AI Chips, Really?

#artificialintelligence

The best part about the term "Artificial Intelligence" is that nobody can really tell you what it exactly means. The main reason for this stems from the term "intelligence", with definitions ranging from the ability to practice logical reasoning to the ability to perform cognitive tasks or dream up symphonies. When it comes to human intelligence, properties such as self-awareness, complex cognitive feats, and the ability to plan and motivate oneself are generally considered to be defining features. But frankly, what is and isn't "intelligence" is open to debate. What isn't open to debate is that AI is a marketing goldmine.


On Online Learning in Kernelized Markov Decision Processes

arXiv.org Machine Learning

Abstract-- We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS). I. INTRODUCTION The goal of reinforcement learning (RL) is to learn optimal behavior by repeated interaction with an unknown environment, usually modeled as a Markov Decision Process (MDP). Performance is typically measured by the amount of interaction, in terms of episodes or rounds, needed to arriv e at an optimal (or near-optimal) policy; this is also known as the sample complexity of RL [1]. The sample complexity objective encourages efficient exploration across states a nd actions, but, at the same time, is indifferent to the reward earned during the learning phase.


Gradient-based Adaptive Markov Chain Monte Carlo

arXiv.org Machine Learning

We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes.


Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

arXiv.org Artificial Intelligence

Being Optimistic to Be Conservative: Quickly Learning a CV aR Policy Ramtin Keramati 1, Christoph Dann 2, Alex T amkin 3, Emma Brunskill 3 1 Institute of Computational and Mathematical Engineering (ICME), Stanford University, California, USA 2 Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA 3 Department of Computer Science, Stanford University, California, USA {keramati,atamkin,ebrun } @cs.stanford.edu Abstract While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CV aR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CV aR. In this paper, we present the first algorithm for sample-efficient learning of CV aR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CV aR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces. Introduction A key goal in reinforcement learning (RL) is to quickly learn to make good decisions by interacting with an environment. In most cases the quality of the decision policy is evaluated with respect to its expected (discounted) sum of rewards. However, in many interesting cases, it is important to consider the full distributions over the potential sum of rewards, and the desired objective may be a risk-sensitive measure of this distribution. For example, a patient undergoing a surgery for a knee replacement will (hopefully) only experience that procedure once or twice, and may will be interested in the distribution of potential results for a single procedure, rather than what may happen on average if he or she were to undertake that procedure hundreds of time. Finance and (machine) control are other cases where interest in risk-sensitive outcomes are common. A popular risk-sensitive measure of a distribution of outcomes is the Conditional V alue at Risk (CV aR) (Artzner et al. 1999). Intuitively, CV aR is the expected reward in the worst α -fraction of outcomes, and has seen extensive use in financial portfolio optimization (Zhu and Fukushima 2009), often under the name "expected shortfall".