Undirected Networks
Reasoning about Discrete and Continuous Noisy Sensors and Effectors in Dynamical Systems
Belle, Vaishak, Levesque, Hector J.
Among the many approaches for reasoning about degrees of belief in the presence of noisy sensing and acting, the logical account proposed by Bacchus, Halpern, and Levesque is perhaps the most expressive. While their formalism is quite general, it is restricted to fluents whose values are drawn from discrete finite domains, as opposed to the continuous domains seen in many robotic applications. In this work, we show how this limitation in that approach can be lifted. By dealing seamlessly with both discrete distributions and continuous densities within a rich theory of action, we provide a very general logical specification of how belief should change after acting and sensing in complex noisy domains.
Minimax Learning of Ergodic Markov Chains
Wolfer, Geoffrey, Kontorovich, Aryeh
We compute the finite-sample minimax (modulo logarithmic factors) sample complexity of learning the parameters of a finite Markov chain from a single long sequence of states. Our error metric is a natural variant of total variation. The sample complexity necessarily depends on the spectral gap and minimal stationary probability of the unknown chain - for which, at least in the reversible case, there are known finite-sample estimators with fully empirical confidence intervals. To our knowledge, this is the first PAC-type result with nearly matching (up to logs) upper and lower bounds for learning, in any metric in the context of Markov chains.
Sequential Coordination of Deep Models for Learning Visual Arithmetic
Crawford, Eric, Rabusseau, Guillaume, Pineau, Joelle
Achieving machine intelligence requires a smooth integration of perception and reasoning, yet models developed to date tend to specialize in one or the other; sophisticated manipulation of symbols acquired from rich perceptual spaces has so far proved elusive. Consider a visual arithmetic task, where the goal is to carry out simple arithmetical algorithms on digits presented under natural conditions (e.g. hand-written, placed randomly). We propose a two-tiered architecture for tackling this problem. The lower tier consists of a heterogeneous collection of information processing modules, which can include pre-trained deep neural networks for locating and extracting characters from the image, as well as modules performing symbolic transformations on the representations extracted by perception. The higher tier consists of a controller, trained using reinforcement learning, which coordinates the modules in order to solve the high-level task. For instance, the controller may learn in what contexts to execute the perceptual networks and what symbolic transformations to apply to their outputs. The resulting model is able to solve a variety of tasks in the visual arithmetic domain, and has several advantages over standard, architecturally homogeneous feedforward networks including improved sample efficiency.
An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation
Huang, Cheng-Zhi Anna, Vaswani, Ashish, Uszkoreit, Jakob, Shazeer, Noam, Hawthorne, Curtis, Dai, Andrew M., Hoffman, Matthew D., Eck, Douglas
Music relies heavily on self-reference to build structure and meaning. We explore the Transformer architecture (Vaswani et al., 2017) as a generative model for music, as self-attention has shown compelling results on tasks that require long-term structure such as Wikipedia summary generation (Liu et al, 2018). However, timing information is critical for polyphonic music, and Transformer does not explicitly model absolute or relative timing in its structure. To address this challenge, Shaw et al. (2018) introduced relative position representations to self-attention to improve machine translation. However, the formulation was not scalable to longer sequences. We propose an improved formulation which reduces the memory requirements of the relative position computation from $O(l^2d)$ to $O(ld)$, making it possible to train much longer sequences and achieve faster convergence. In experiments on symbolic music we find that relative self-attention substantially improves sample quality for unconditioned generation and is able to generate sequences of lengths longer than those from the training set. When primed with an initial sequence, the model generates continuations that develop the prime coherently and exhibit long-term structure. Relative self-attention can be instrumental in capturing richer relationships within a musical piece.
Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes Processes
Bedathur, Srikanta, Bhattacharya, Indrajit, Choudhari, Jayesh, Dasgupta, Anirban
Abstract--Social media conversations unfold based on complex interactions between users, topics and time. While recent models have been proposed to capture network strengths between users, users' topical preferences and temporal patterns between posting and response times, interaction patterns between topics has not been studied. We argue that social media conversations naturally involve interacting rather than independent topics. Modeling such topical interaction patterns can additionally help in inference of latent variables in the data such as diffusion parents and topics of events. We propose the Hidden Markov Hawkes Process (HMHP) that incorporates topical Markov Chains within Hawkes processes to jointly model topical interactions along with useruser and user-topic patterns. We propose a Gibbs sampling algorithm for HMHP that jointly infers the network strengths, diffusion paths, the topics of the posts as well as the topictopic interactions. We show using experiments on real and semisynthetic data that HMHP is able to generalize better and recover the network strengths, topics and diffusion paths more accurately that state-of-the-art baselines. More interestingly, HMHP finds insightful interactions between topics in real tweets which no existing model is able to do. This can potentially lead to actionable insights enabling, e.g., user targeting for influence maximization. A popular area of recent research has been the study of information diffusion cascades, where information spreads over a social network when a'parent' event from one infected node influences a'child' event at neighboring node [5], [11], [19], [6], [10]. The action of propagating information between two neighboring nodes depends on various factors, such as the strength of influence between the nodes, the topical content of the parent event and the extent of interest of the child node towards that topic. Explosion of social media data has made it possible to analyze and evaluate different models that seek to explain such information cascades. However, many relevant variables such as the network influence strengths, the identity of influencing or parent event for any event, and the actual topics are typically unobserved for most social network data.
Neural Melody Composition from Lyrics
Bao, Hangbo, Huang, Shaohan, Wei, Furu, Cui, Lei, Wu, Yu, Tan, Chuanqi, Piao, Songhao, Zhou, Ming
In this paper, we study a novel task that learns to compose music from natural language. Given the lyrics as input, we propose a melody composition model that generates lyrics-conditional melody as well as the exact alignment between the generated melody and the given lyrics simultaneously. More specifically, we develop the melody composition model based on the sequence-to-sequence framework. It consists of two neural encoders to encode the current lyrics and the context melody respectively, and a hierarchical decoder to jointly produce musical notes and the corresponding alignment. Experimental results on lyrics-melody pairs of 18,451 pop songs demonstrate the effectiveness of our proposed methods. In addition, we apply a singing voice synthesizer software to synthesize the "singing" of the lyrics and melodies for human evaluation. Results indicate that our generated melodies are more melodious and tuneful compared with the baseline method.
On Markov Chain Gradient Descent
Sun, Tao, Sun, Yuejiao, Yin, Wotao
Stochastic gradient methods are the workhorse (algorithms) of large-scale optimization problems in machine learning, signal processing, and other computational sciences and engineering. This paper studies Markov chain gradient descent, a variant of stochastic gradient descent where the random samples are taken on the trajectory of a Markov chain. Existing results of this method assume convex objectives and a reversible Markov chain and thus have their limitations. We establish new non-ergodic convergence under wider step sizes, for nonconvex problems, and for non-reversible finite-state Markov chains. Nonconvexity makes our method applicable to broader problem classes. Non-reversible finite-state Markov chains, on the other hand, can mix substatially faster. To obtain these results, we introduce a new technique that varies the mixing levels of the Markov chains. The reported numerical results validate our contributions.
Smooth Structured Prediction Using Quantum and Classical Gibbs Samplers
Sepehry, Behrooz, Iranmanesh, Ehsan, Friedlander, Michael P., Ronagh, Pooya
We introduce a quantum algorithm for solving structured-prediction problems with a runtime that scales with the square root of the size of the label space, but scales in $\widetilde O\left(\frac{1}{\epsilon^5}\right)$ with respect to the precision of the solution. In doing so, we analyze a stochastic gradient algorithm for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order $\widetilde O(\epsilon)$. Our algorithm uses quantum Gibbs sampling at temperature $O (\epsilon)$ as a subroutine. Numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach.
Solving Non-identifiable Latent Feature Models
Suzuki, Ryota, Takahashi, Shingo, Petradwala, Murtuza, Kohmoto, Shigeru
Latent feature models (LFM)s are widely employed for extracting latent structures of data. While offering high, parameter estimation is difficult with LFMs because of the combinational nature of latent features, and non-identifiability is a particularly difficult problem when parameter estimation is not unique and there exists equivalent solutions. In this paper, a necessary and sufficient condition for non-identifiability is shown. The condition is significantly related to dependency of features, and this implies that non-identifiability may often occur in real-world applications. A novel method for parameter estimation that solves the non-identifiability problem is also proposed. This method can be combined as a post-process with existing methods and can find an appropriate solution by hopping efficiently through equivalent solutions. We have evaluated the effectiveness of the method on both synthetic and real-world datasets.
Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process
Wachi, Akifumi, Kajino, Hiroshi, Munawar, Asim
In many real-world applications (e.g., planetary exploration, robot navigation), an autonomous agent must be able to explore a space with guaranteed safety. Most safe exploration algorithms in the field of reinforcement learning and robotics have been based on the assumption that the safety features are a priori known and time-invariant. This paper presents a learning algorithm called ST-SafeMDP for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant. In this setting, the agent explores MDPs while constraining the probability of entering unsafe states defined by a safety function being below a threshold. The unknown and time-variant safety values are modeled using a spatio-temporal Gaussian process. However, there remains an issue that an agent may have no viable action in a shrinking true safe space. To address this issue, we formulate a problem maximizing the cumulative number of safe states in the worst case scenario with respect to future observations. The effectiveness of this approach was demonstrated in two simulation settings, including one using real lunar terrain data.