AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

arXiv.org Artificial IntelligenceFeb-10-2016

Given a Markov Decision Process (MDP) with $n$ states and a totalnumber $m$ of actions, we study the number of iterations needed byPolicy Iteration (PI) algorithms to converge to the optimal$\gamma$-discounted policy. We consider two variations of PI: Howard'sPI that changes the actions in all states with a positive advantage,and Simplex-PI that only changes the action in the state with maximaladvantage. We show that Howard's PI terminates after at most $O\left(\frac{m}{1-\gamma}\log\left(\frac{1}{1-\gamma}\right)\right)$iterations, improving by a factor $O(\log n)$ a result by Hansen etal., while Simplex-PI terminates after at most $O\left(\frac{nm}{1-\gamma}\log\left(\frac{1}{1-\gamma}\right)\right)$iterations, improving by a factor $O(\log n)$ a result by Ye. Undersome structural properties of the MDP, we then consider bounds thatare independent of the discount factor~$\gamma$: quantities ofinterest are bounds $\tau\_t$ and $\tau\_r$---uniform on all states andpolicies---respectively on the \emph{expected time spent in transientstates} and \emph{the inverse of the frequency of visits in recurrentstates} given that the process starts from the uniform distribution.Indeed, we show that Simplex-PI terminates after at most $\tilde O\left(n^3 m^2 \tau\_t \tau\_r \right)$ iterations. This extends arecent result for deterministic MDPs by Post & Ye, in which $\tau\_t\le 1$ and $\tau\_r \le n$, in particular it shows that Simplex-PI isstrongly polynomial for a much larger class of MDPs. We explain whysimilar results seem hard to derive for Howard's PI. Finally, underthe additional (restrictive) assumption that the state space ispartitioned in two sets, respectively states that are transient andrecurrent for all policies, we show that both Howard's PI andSimplex-PI terminate after at most $\tilde O(m(n^2\tau\_t+n\tau\_r))$iterations.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1306.0386

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

McAllister, Rowan, Rasmussen, Carl Edward

arXiv.org Machine LearningFeb-8-2016

We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distribution of possible system trajectories. We additionally predict trajectories w.r.t. a filtering process, achieving significantly higher performance than combining a filter with a policy optimised by the original (unfiltered) framework. Our test setup is the cartpole swing-up task with sensor noise, which involves nonlinear dynamics and requires nonlinear control.

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Machine Learning

1602.02523

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Collaborative filtering via sparse Markov random fields

Tran, Truyen, Phung, Dinh, Venkatesh, Svetha

arXiv.org Machine LearningFeb-8-2016

Recommender systems play a central role in providing individualized access to information and services. This paper focuses on collaborative filtering, an approach that exploits the shared structure among mind-liked users and similar items. In particular, we focus on a formal probabilistic framework known as Markov random fields (MRF). We address the open problem of structure learning and introduce a sparsity-inducing algorithm to automatically estimate the interaction structures between users and between items. Item-item and user-user correlation networks are obtained as a by-product. Large-scale experiments on movie recommendation and date matching datasets demonstrate the power of the proposed method.

artificial intelligence, machine learning, parameterization, (18 more...)

arXiv.org Machine Learning

1602.02842

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

Word Representations, Tree Models and Syntactic Functions

Šuster, Simon, van Noord, Gertjan, Titov, Ivan

arXiv.org Machine LearningFeb-5-2016

Word representations induced from models with discrete latent variables (e.g.\ HMMs) have been shown to be beneficial in many NLP applications. In this work, we exploit labeled syntactic dependency trees and formalize the induction problem as unsupervised learning of tree-structured hidden Markov models. Syntactic functions are used as additional observed variables in the model, influencing both transition and emission components. Such syntactic information can potentially lead to capturing more fine-grain and functional distinctions between words, which, in turn, may be desirable in many NLP applications. We evaluate the word representations on two tasks -- named entity recognition and semantic frame identification. We observe improvements from exploiting syntactic function information in both cases, and the results rivaling those of state-of-the-art representation learning methods. Additionally, we revisit the relationship between sequential and unlabeled-tree models and find that the advantage of the latter is not self-evident.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1508.07709

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(2 more...)

Add feedback

Time-Varying Gaussian Process Bandit Optimization

Bogunovic, Ilija, Scarlett, Jonathan, Cevher, Volkan

arXiv.org Machine LearningJan-25-2016

We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1601.0665

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Online Event Recognition from Moving Vessel Trajectories

Patroumpas, Kostas, Alevizos, Elias, Artikis, Alexander, Vodas, Marios, Pelekis, Nikos, Theodoridis, Yannis

arXiv.org Artificial IntelligenceJan-22-2016

We present a system for online monitoring of maritime activity over streaming positions from numerous vessels sailing at sea. It employs an online tracking module for detecting important changes in the evolving trajectory of each vessel across time, and thus can incrementally retain concise, yet reliable summaries of its recent movement. In addition, thanks to its complex event recognition module, this system can also offer instant notification to marine authorities regarding emergency situations, such as risk of collisions, suspicious moves in protected zones, or package picking at open sea. Not only did our extensive tests validate the performance, efficiency, and robustness of the system against scalable volumes of real-world and synthetically enlarged datasets, but its deployment against online feeds from vessels has also confirmed its capabilities for effective, real-time maritime surveillance.

data mining, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10707-016-0266-x

1601.06041

Country: Europe > Greece (0.14)

Genre: Research Report (0.81)

Industry:

Transportation (0.46)
Government > Military (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.93)
(3 more...)

Add feedback

Regret bounds for Narendra-Shapiro bandit algorithms

Gadat, Sébastien, Panloup, Fabien, Saadane, Sofiane

arXiv.org Machine LearningJan-16-2016

Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this paper, we adress the following question: are the Narendra-Shapiro (NS) bandit algorithms competitive from a \textit{regret} point of view? In our main result, we show that some competitive bounds can be obtained for such algorithms in their penalized version (introduced in \cite{Lamberton_Pages}). More precisely, up to an over-penalization modification, the pseudo-regret $\bar{R}_n$ related to the penalized two-armed bandit algorithm is uniformly bounded by $C \sqrt{n}$ (where $C$ is made explicit in the paper). \noindent We also generalize existing convergence and rates of convergence results to the multi-armed case of the over-penalized bandit algorithm, including the convergence toward the invariant measure of a Piecewise Deterministic Markov Process (PDMP) after a suitable renormalization. Finally, ergodic properties of this PDMP are given in the multi-armed case.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1502.04874

Country: Europe > France (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Provable Tensor Methods for Learning Mixtures of Generalized Linear Models

Sedghi, Hanie, Janzamin, Majid, Anandkumar, Anima

arXiv.org Machine LearningJan-12-2016

A generalized linear model (GLM) is a flexible extension of linear regression which allows the response or the output to be a nonlinear function of the input via an activation function. In other words, in a GLM, the linear regression of the input is passed through an activation function to generate the response. GLMs unify popular frameworks such as logistic regression and Poisson regression with linear regression. At the same time, they can be learnt with guarantees using simple iterative methods (Kakade et al., 2011). In many scenarios, however, GLMs may be too simplistic, and mixtures of GLMs can be much more effective since they combine the expressive power of latent variables with the predictive capabilities of the GLM. Mixtures of GLMs have widespread applicability including object recognition (Quattoni et al., 2004), human action recognition (Wang and Mori, 2009), syntactic parsing (Petrov and Klein, 2007), and machine translation (Liang et al., 2006). Traditionally, mixture models are learnt through heuristics such as expectation maximization (EM) (Jordan and Jacobs, 1994; Xu et al., 1995) or variational Bayes (Bishop and Svensen, 2003). However, these methods can converge to spurious local optima and have slow convergence rates for high dimensional models. In contrast, we employ a method-of-moments approach for guaranteed learning of mixtures of GLMs.

artificial intelligence, machine learning, score function, (18 more...)

arXiv.org Machine Learning

1412.3046

Country:

North America > United States > California (0.28)
Asia > Middle East > Jordan (0.25)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Probabilistic Programming with Gaussian Process Memoization

Schaechtle, Ulrich, Zinberg, Ben, Radul, Alexey, Stathis, Kostas, Mansinghka, Vikash K.

arXiv.org Machine LearningJan-5-2016

Gaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows how to embed Gaussian processes in any higher-order probabilistic programming language, using an idiom based on memoization, and demonstrates its utility by implementing and extending classic and state-of-the-art GP applications. The interface to Gaussian processes, called gpmem, takes an arbitrary real-valued computational process as input and returns a statistical emulator that automatically improve as the original process is invoked and its input-output behavior is recorded. The flexibility of gpmem is illustrated via three applications: (i) robust GP regression with hierarchical hyper-parameter learning, (ii) discovering symbolic expressions from time-series data by fully Bayesian structure learning over kernels generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian optimization with automatic inference and action selection. All applications share a single 50-line Python library and require fewer than 20 lines of probabilistic code each.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1512.05665

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

Watter, Manuel, Springenberg, Jost, Boedecker, Joschka, Riedmiller, Martin

Neural Information Processing SystemsDec-31-2015

We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images. E2C consists of a deep generative model, belonging to the family of variational autoencoders, that learns to generate image trajectories from a latent space in which the dynamics is constrained to be locally linear. Our model is derived directly from an optimal control formulation in latent space, supports long-term prediction of image sequences and exhibits strong performance on a variety of complex control problems.

latent space, optimal control, trajectory, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback