Markov Models
Saccader: Improving Accuracy of Hard Attention Models for Vision
Elsayed, Gamaleldin F., Kornblith, Simon, Le, Quoc V.
Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, they are often regarded as black boxes. Because they compute a nonlinear function of the entire input image, their decisions are difficult to interpret. One approach that offers some level of interpretability by design is \textit{hard attention}, which selects only relevant portions of the image. However, training hard attention models with only class label supervision is challenging, and hard attention has proved difficult to scale to complex datasets. Here, we propose a novel hard attention model, which we term Saccader, as well as a self-supervised pretraining procedure for this model that does not suffer from optimization challenges. Through pretraining and policy gradient optimization, the Saccader model estimates the relevance of different image patches to the downstream task, and uses a novel cell to select patches to classify at different times. Our approach achieves high accuracy on ImageNet while providing more interpretable predictions.
Learning to Advertise for Organic Traffic Maximization in E-Commerce Product Feeds
Chen, Dagui, Jin, Junqi, Zhang, Weinan, Pan, Fei, Niu, Lvyin, Yu, Chuan, Wang, Jun, Li, Han, Xu, Jian, Gai, Kun
Most e-commerce product feeds provide blended results of advertised products and recommended products to consumers. The underlying advertising and recommendation platforms share similar if not exactly the same set of candidate products. Consumers' behaviors on the advertised results constitute part of the recommendation model's training data and therefore can influence the recommended results. We refer to this process as Leverage. Considering this mechanism, we propose a novel perspective that advertisers can strategically bid through the advertising platform to optimize their recommended organic traffic. By analyzing the real-world data, we first explain the principles of Leverage mechanism, i.e., the dynamic models of Leverage. Then we introduce a novel Leverage optimization problem and formulate it with a Markov Decision Process. To deal with the sample complexity challenge in model-free reinforcement learning, we propose a novel Hybrid Training Leverage Bidding (HTLB) algorithm which combines the real-world samples and the emulator-generated samples to boost the learning speed and stability. Our offline experiments as well as the results from the online deployment demonstrate the superior performance of our approach.
Evaluating Hierarchies through A Partially Observable Markov Decision Processes Methodology
Huang, Weipeng, Piao, Guangyuan, Moreno, Raul, Hurley, Neil
Hierarchical clustering has been shown to be valuable in many scenarios, e.g. catalogues, biology research, image processing, and so on. Despite its usefulness to many situations, there is no agreed methodology on how to properly evaluate the hierarchies produced from different techniques, particularly in the case where ground-truth labels are unavailable. This motivates us to propose a framework for assessing the quality of hierarchical clustering allocations which covers the case of no ground-truth information. Such a quality measurement is useful, for example, to assess the hierarchical structures used by online retailer websites to display their product catalogues. Differently to all the previous measures and metrics, our framework tackles the evaluation from a decision theoretic perspective. We model the process as a bot searching stochastically for items in the hierarchy and establish a measure representing the degree to which the hierarchy supports this search. We employ the concept of Partially Observable Markov Decision Processes (POMDP) to model the uncertainty, the decision making, and the cognitive return for searchers in such a scenario. In this paper, we fully discuss the modeling details and demonstrate its application on some datasets.
A reaction network scheme which implements inference and learning for Hidden Markov Models
Singh, Abhinav, Wiuf, Carsten, Behera, Abhishek, Gopalkrishnan, Manoj
With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every fixed point of the Baum-Welch algorithm for HMMs is a fixed point of our reaction network scheme, and every positive fixed point of our scheme is a fixed point of the Baum-Welch algorithm. We prove that the "Expectation" step and the "Maximization" step of our reaction network separately converge exponentially fast. We simulate mass-action kinetics for our network on an example sequence, and show that it learns the same parameters for the HMM as the Baum-Welch algorithm.
Transfer in Deep Reinforcement Learning using Knowledge Graphs
Ammanabrolu, Prithviraj, Riedl, Mark O.
Text adventure games, in which players must make sense of the world through text descriptions and declare actions through text descriptions, provide a stepping stone toward grounding action in language. Prior work has demonstrated that using a knowledge graph as a state representation and question-answering to pre-train a deep Q-network facilitates faster control policy transfer. In this paper, we explore the use of knowledge graphs as a representation for domain knowledge transfer for training text-adventure playing reinforcement learning agents. Our methods are tested across multiple computer generated and human authored games, varying in domain and complexity, and demonstrate that our transfer learning methods let us learn a higher-quality control policy faster.
Computing Multi-Modal Journey Plans under Uncertainty
Botea, Adi, Kishimoto, Akihiro, Nikolova, Evdokia, Braghin, Stefano, Berlingerio, Michele, Daly, Elizabeth
Multi-modal journey planning, which allows multiple types of transport within a single trip, is becoming increasingly popular, due to a strong practical interest and an increasing availability of data. In real life, transport networks feature uncertainty. Yet, most approaches assume a deterministic environment, making plans more prone to failures such as missed connections and major delays in the arrival. This paper presents an approach to computing optimal contingent plans in multi-modal journey planning. The problem is modeled as a search in an and/or state space. We describe search enhancements used on top of the AO* algorithm. Enhancements include admissible heuristics, multiple types of pruning that preserve the completeness and the optimality, and a hybrid search approach with a deterministic and a nondeterministic search. We demonstrate an NP-hardness result, with the hardness stemming from the dynamically changing distributions of the travel time random variables. We perform a detailed empirical analysis on realistic transport networks from cities such as Montpellier, Rome and Dublin. The results demonstrate the effectiveness of our algorithmic contributions, and the benefits of contingent plans as compared to standard sequential plans, when the arrival and departure times of buses are characterized by uncertainty.
Survey on Deep Neural Networks in Speech and Vision Systems
Alam, Mahbubul, Samad, Manar D., Vidyaratne, Lasitha, Glandon, Alexander, Iftekharuddin, Khan M.
This survey presents a review of state-of-the-art deep neural network architectures, algorithms, and systems in vision and speech applications. Recent advances in deep artificial neural network algorithms and architectures have spurred rapid innovation and development of intelligent vision and speech systems. With availability of vast amounts of sensor data and cloud computing for processing and training of deep neural networks, and with increased sophistication in mobile and embedded technology, the next-generation intelligent systems are poised to revolutionize personal and commercial computing. This survey begins by providing background and evolution of some of the most successful deep learning models for intelligent vision and speech systems to date. An overview of large-scale industrial research and development efforts is provided to emphasize future trends and prospects of intelligent vision and speech systems. Robust and efficient intelligent systems demand low-latency and high fidelity in resource constrained hardware platforms such as mobile devices, robots, and automobiles. Therefore, this survey also provides a summary of key challenges and recent successes in running deep neural networks on hardware-restricted platforms, i.e. within limited memory, battery life, and processing capabilities. Finally, emerging applications of vision and speech across disciplines such as affective computing, intelligent transportation, and precision medicine are discussed. To our knowledge, this paper provides one of the most comprehensive surveys on the latest developments in intelligent vision and speech applications from the perspectives of both software and hardware systems. Many of these emerging technologies using deep neural networks show tremendous promise to revolutionize research and development for future vision and speech systems.
Iterative Update and Unified Representation for Multi-Agent Reinforcement Learning
Long, Jiancheng, Zhang, Hongming, Yu, Tianyang, Xu, Bo
Multi-agent systems have a wide range of applications in cooperative and competitive tasks. As the number of agents increases, nonstationarity gets more serious in multi-agent reinforcement learning (MARL), which brings great difficulties to the learning process. Besides, current mainstream algorithms configure each agent an independent network,so that the memory usage increases linearly with the number of agents which greatly slows down the interaction with the environment. Inspired by Generative Adversarial Networks (GAN), this paper proposes an iterative update method (IU) to stabilize the nonstationary environment. Further, we add first-person perspective and represent all agents by only one network which can change agents' policies from sequential compute to batch compute. Similar to continual lifelong learning, we realize the iterative update method in this unified representative network (IUUR). In this method, iterative update can greatly alleviate the nonstationarity of the environment, unified representation can speed up the interaction with environment and avoid the linear growth of memory usage. Besides, this method does not bother decentralized execution and distributed deployment. Experiments show that compared with MADDPG, our algorithm achieves state-of-the-art performance and saves wall-clock time by a large margin especially with more agents.
"Conservatives Overfit, Liberals Underfit": The Social-Psychological Control of Affect and Uncertainty
Hoey, Jesse, MacKinnon, Neil J.
The presence of artificial agents in human social networks is growing. From chatbots to robots, human experience in the developed world is moving towards a socio-technical system in which agents can be technological or biological, with increasingly blurred distinctions between. Given that emotion is a key element of human interaction, enabling artificial agents with the ability to reason about affect is a key stepping stone towards a future in which technological agents and humans can work together. This paper presents work on building intelligent computational agents that integrate both emotion and cognition. These agents are grounded in the well-established social-psychological Bayesian Affect Control Theory (BayesAct). The core idea of BayesAct is that humans are motivated in their social interactions by affective alignment: they strive for their social experiences to be coherent at a deep, emotional level with their sense of identity and general world views as constructed through culturally shared symbols. This affective alignment creates cohesive bonds between group members, and is instrumental for collaborations to solidify as relational group commitments. BayesAct agents are motivated in their social interactions by a combination of affective alignment and decision theoretic reasoning, trading the two off as a function of the uncertainty or unpredictability of the situation. This paper provides a high-level view of dual process theories and advances BayesAct as a plausible, computationally tractable model based in social-psychological theory. We introduce a revised BayesAct model that more deeply integrates social-psychological theorising, and we demonstrate a component of the model as being sufficient to account for cognitive biases about fairness, dissonance and conformity. We show how the model can unify different exploration strategies in reinforcement learning.
Conditional LSTM-GAN for Melody Generation from Lyrics
--Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables to learn and discover latent relationship between interesting lyrics and accompanying melody. Unfortunately, the limited availability of paired lyrics-melody dataset with alignment information has hindered the research progress. T o address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-T erm Memory - Generative Adversarial Network (LSTM-GAN) for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics. I NTRODUCTION Music generation is also referred to as music composition with the process of creating or writing an original piece of music, which is one of human creative activities [1]. Without understanding music rules and concepts well, creating pleasing sounds is impossible. To learn these kinds of rules and concepts such as mathematical relationships between notes, timing, and melody, the earliest study of various music computational techniques related to Artificial Intelligence (AI) has emerged for music composition in the middle of 1950s [2]. Markov models as a representative method of machine learning have been applied to algorithmic composition [3]. However, due to the limited availability of paired lyrics-melody dataset with alignment information, research progress of lyrics-conditioned music generation has been obstructed.