AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

arXiv.org Artificial IntelligenceNov-4-2017

Conventional reinforcement learning methods for Markov decision processes rely on weakly-guided, stochastic searches to drive the learning process. It can therefore be difficult to predict what agent behaviors might emerge. In this paper, we consider an information-theoretic cost function for performing constrained stochastic searches that promote the formation of risk-averse to risk-favoring behaviors. This cost function is the value of information, which provides the optimal trade-off between the expected return of a policy and the policy's complexity; policy complexity is measured by number of bits and controlled by a single hyperparameter on the cost function. As the policy complexity is reduced, the agents will increasingly eschew risky actions. This reduces the potential for high accrued rewards. As the policy complexity increases, the agents will take actions, regardless of the risk, that can raise the long-term rewards. The obtainable reward depends on a single, tunable hyperparameter that regulates the degree of policy complexity. We evaluate the performance of value-of-information-based policies on a stochastic version of Ms. Pac-Man. A major component of this paper is the demonstration that ranges of policy complexity values yield different game-play styles and explaining why this occurs. We also show that our reinforcement-learning search mechanism is more efficient than the others we utilize. This result implies that the value of information theory is appropriate for framing the exploitation-exploration trade-off in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

1702.08628

Country:

Europe (1.00)
North America > United States > New York (0.46)

Genre: Research Report > New Finding (0.92)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Factoring Exogenous State for Model-Free Monte Carlo

McGregor, Sean, Houtman, Rachel, Montgomery, Claire, Metoyer, Ronald, Dietterich, Thomas G.

arXiv.org Machine LearningNov-3-2017

Policy analysts wish to visualize a range of policies for large simulator-defined Markov Decision Processes (MDPs). One visualization approach is to invoke the simulator to generate on-policy trajectories and then visualize those trajectories. When the simulator is expensive, this is not practical, and some method is required for generating trajectories for new policies without invoking the simulator. The method of Model-Free Monte Carlo (MFMC) can do this by stitching together state transitions for a new policy based on previously-sampled trajectories from other policies. This "off-policy Monte Carlo simulation" method works well when the state space has low dimension but fails as the dimension grows. This paper describes a method for factoring out some of the state and action variables so that MFMC can work in high-dimensional MDPs. The new method, MFMCi, is evaluated on a very challenging wildfire management MDP.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Machine Learning

1703.0939

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.47)
Law Enforcement & Public Safety > Fire & Emergency Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

QMDP-Net: Deep Learning for Planning under Partial Observability

Karkus, Peter, Hsu, David, Lee, Wee Sun

arXiv.org Artificial IntelligenceNov-2-2017

This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set and "transfer" to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.

artificial intelligence, machine learning, qmdp-net, (18 more...)

arXiv.org Artificial Intelligence

1703.06692

Genre:

Workflow (0.68)
Research Report > New Finding (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

Binary Bouncy Particle Sampler

Pakman, Ari

arXiv.org Machine LearningNov-2-2017

The Bouncy Particle Sampler is a novel rejection-free non-reversible sampler for differentiable probability distributions over continuous variables. We generalize the algorithm to piecewise differentiable distributions and apply it to generic binary distributions using a piecewise differentiable augmentation. We illustrate the new algorithm in a binary Markov Random Field example, and compare it to binary Hamiltonian Monte Carlo. Our results suggest that binary BPS samplers are better for easy to mix distributions.

artificial intelligence, machine learning, sampler, (13 more...)

arXiv.org Machine Learning

1711.00922

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Add feedback

Fast mixing for Latent Dirichlet allocation

Jonasson, Johan

arXiv.org Machine LearningNov-1-2017

Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in probability theory in general and in machine learning in particular. A Markov chain is devised so that its stationary distribution is some probability distribution of interest. Then one samples from the given distribution by running the Markov chain for a "long time" until it appears to be stationary and then collects the sample. However these chains are often very complex and there are no theoretical guarantees that stationarity is actually reached. In this paper we study the Gibbs sampler of the posterior distribution of a very simple case of Latent Dirichlet Allocation, the arguably most well known Bayesian unsupervised learning model for text generation and text classification. It is shown that when the corpus consists of two long documents of equal length $m$ and the vocabulary consists of only two different words, the mixing time is at most of order $m^2\log m$ (which corresponds to $m\log m$ rounds over the corpus). It will be apparent from our analysis that it seems very likely that the mixing time is not much worse in the more relevant case when the number of documents and the size of the vocabulary are also large as long as each word is represented a large number in each document, even though the computations involved may be intractable.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1701.0296

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Goal-Based Movement Model for Continuous Multi-Agent Tasks

Iqbal, Shariq, Pearson, John

arXiv.org Machine LearningOct-31-2017

Despite increasing attention paid to the need for fast, scalable methods to analyze next-generation neuroscience data, comparatively little attention has been paid to the development of similar methods for behavioral analysis. Just as the volume and complexity of brain data have grown, behavioral paradigms in systems neuroscience have likewise become more naturalistic and less constrained, necessitating an increase in the flexibility and scalability of the models used to study them. In particular, key assumptions made in the analysis of typical decision paradigms --- optimality; analytic tractability; discrete, low-dimensional action spaces --- may be untenable in richer tasks. Here, using the case of a two-player, real-time, continuous strategic game as an example, we show how the use of modern machine learning methods allows us to relax each of these assumptions. Following an inverse reinforcement learning approach, we are able to succinctly characterize the joint distribution over players' actions via a generative model that allows us to simulate realistic game play. We compare simulated play from a number of generative time series models and show that ours successfully resists mode collapse while generating trajectories with the rich variability of real behavior. Together, these methods offer a rich class of models for the analysis of continuous action tasks at the single-trial level.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Machine Learning

1702.07319

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Leisure & Entertainment > Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Tensor network language model

Pestun, Vasily, Vlassopoulos, Yiannis

arXiv.org Machine LearningOct-30-2017

We propose a new statistical model suitable for machine learning of systems with long distance correlations such as natural languages. The model is based on directed acyclic graph decorated by multi-linear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebro-geometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation.

machine learning, natural language, tensor network, (18 more...)

arXiv.org Machine Learning

1710.10248

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics

Wehmeyer, Christoph, Noé, Frank

arXiv.org Machine LearningOct-30-2017

Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our time-lagged autoencoder reliably finds low-dimensional embeddings for highdimensional feature spaces which capture the slow dynamics of the underlying stochastic processes-beyond the capabilities of linear dimension reduction techniques. Molecular dynamics (MD) simulation allows us to probe the full spatiotemporal detail of molecular processes, but its usefulness has long been limited by the sampling problem. If we do not want to choose the library of feature functions by hand, but instead want to optimize the nonlinear mapping E by employing a neural network, we have again two options: (1) employ the variational approach. In this paper we investigate option (2), which naturally leads to using a time-lagged autoencoder (TAE).

artificial intelligence, chem, machine learning, (18 more...)

arXiv.org Machine Learning

1710.11239

Country:

North America > United States (0.47)
Europe (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Robust and Efficient Transfer Learning with Hidden-Parameter Markov Decision Processes

Killian, Taylor, Daulton, Samuel, Konidaris, George, Doshi-Velez, Finale

arXiv.org Machine LearningOct-30-2017

We introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings. Our new framework correctly models the joint uncertainty in the latent parameters and the state space. We also replace the original Gaussian Process-based model with a Bayesian Neural Network, enabling more scalable inference. Thus, we expand the scope of the HiP-MDP to applications with higher dimensions and more complex dynamics.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1706.06544

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.75)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Add feedback

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Jain, Prateek, Kakade, Sham M., Kidambi, Rahul, Netrapalli, Praneeth, Pillutla, Venkata Krishna, Sidford, Aaron

arXiv.org Machine LearningOct-25-2017

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.

artificial intelligence, machine learning, matrix, (10 more...)

arXiv.org Machine Learning

1710.0943

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback