Deep Learning
VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem
Clark, Ronald (University of Oxford) | Wang, Sen (University of Oxford) | Wen, Hongkai (University of Oxford) | Markham, Andrew (University of Oxford) | Trigoni, Niki (University of Oxford)
In this paper we present an on-manifold sequence-to-sequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the first end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches. Specifically, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain specific information which significantly mitigates drift. We show that our approach is competitive with state-of-the-art traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors.
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Wang, Jinzhuo (Peking University) | Wang, Wenmin (Peking University ) | Wang, Ronggang (Peking University) | Gao, Wen (Peking University)
Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future interactions. In this paper, we propose a computer Go system that follows expertsโ way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human expertsโ nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of $253,233$ professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.
The Unusual Suspects: Deep Learning Based Mining of Interesting Entity Trivia from Knowledge Graphs
Fatma, Nausheen (International Institute of Information Technology, Hyderabad) | Chinnakotla, Manoj K. (Microsoft, India) | Shrivastava, Manish (International Institute of Information Technology, Hyderabad)
Trivia is any fact about an entity which is interesting due to its unusualness, uniqueness or unexpectedness. Trivia could be successfully employed to promote user engagement in various product experiences featuring the given entity. A Knowledge Graph (KG) is a semantic network which encodes various facts about entities and their relationships. In this paper, we propose a novel approach called DBpedia Trivia Miner (DTM) to automatically mine trivia for entities of a given domain in KGs. The essence of DTM lies in learning an Interestingness Model (IM), for a given domain, from human annotated training data provided in the form of interesting facts from the KG. The IM thus learnt is applied to extract trivia for other entities of the same domain in the KG. We propose two different approaches for learning the IM - a) A Convolutional Neural Network (CNN) based approach and b) Fusion Based CNN (F-CNN) approach which combines both hand-crafted and CNN features. Experiments across two different domains - Bollywood Actors and Music Artists reveal that CNN automatically learns features which are relevant to the task and shows competitive performance relative to hand-crafted feature based baselines whereas F-CNN significantly improves the performance over the baseline approaches which use hand-crafted features alone. Overall, DTM achieves an F1 score of 0.81 and 0.65 in Bollywood Actors and Music Artists domains respectively.
Arnold: An Autonomous Agent to Play FPS Games
Chaplot, Devendra Singh (Carnegie Mellon University) | Lample, Guillaume (Carnegie Mellon University)
Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present Arnold, a completely autonomous agent to play First-Person Shooter Games using only screen pixel data and demonstrate its effectiveness on Doom, a classical first-person shooter game. Arnold is trained with deep reinforcement learning using a recent Action-Navigation architecture, which uses separate deep neural networks for exploring the map and fighting enemies. Furthermore, it utilizes a lot of techniques such as augmenting high-level game features, reward shaping and sequential updates for efficient training and effective performance. Arnold outperforms average humans as well as in-built game bots on different variations of the deathmatch. It also obtained the highest kill-to-death ratio in both the tracks of the Visual Doom AI Competition and placed second in terms of the number of frags.
Improving Deep Reinforcement Learning with Knowledge Transfer
Glatt, Ruben (Universidade de Sรฃo Paulo) | Costa, Anna Helena Reali (Universidade de Sรฃo Paulo)
Recent successes in applying Deep Learning techniques on Reinforcement Learning algorithms have led to a wave of breakthrough developments in agent theory and established the field of Deep Reinforcement Learning (DRL). While DRL has shown great results for single task learning, the multi-task case is still underrepresented in the available literature. This D.Sc. research proposal aims at extending DRL to the multi- task case by leveraging the power of Transfer Learning algorithms to improve the training time and results for multi-task learning. Our focus lies on defining a novel framework for scalable DRL agents that detects similarities between tasks and balances various TL techniques, like parameter initialization, policy or skill transfer.
User Modeling Using LSTM Networks
ลปoลna, Konrad (Jagiellonian University) | Romaลski, Bartลomiej (RTB House)
The LSTM model presented is capable of describing a user of a particular website without human expert supervision. In other words, the model is able to automatically craft features which depict attitude, intention and the overall state of a user. This effect is achieved by projecting the complex history of the user (sequence data corresponding to his actions on the website) into fixed-size vectors of real numbers. The representation obtained may be used to enrich typical models used in e-commerce: click-through rate, conversion rate, recommender systems etc. The goal of this paper is to demonstrate a way of creating the mentioned projection, which we called user2vec, and present possible benefits of incorporating this solution to enhance conversion rate model. Thus enriched modelโs superiority is due not only to its increased internal complexity but also to its capability of learning from wider data โ it indirectly analyzes actions of all website users, rather than being limited to the users who clicked on an ad.
Natural Language Person Retrieval
Zhou, Tao (University of California, Los Angeles) | Yu, Jie (SAIC Innovation Center)
Following the recent progress in image classification and image captioning using deep learning, we developed a novel person retrieval system using natural language, which to our knowledge is first of its kind. Our system employs a state-of-the-art deep learning based natural language object retrieval framework to detect and retrieve people in images. Quantitative experimental results show significant improvement over state-of-the-art meth- ods for generic object retrieval. This line of research provides great advantages for searching large amounts of video surveil- lance footage and it can also be utilized in other domains, such as human-robot interaction.
Attention Based LSTM for Target Dependent Sentiment Classification
Yang, Min (The University of Hong Kong) | Tu, Wenting (The University of Hong Kong) | Wang, Jingxuan (The University of Hong Kong) | Xu, Fei (Chinese Academy of Sciences) | Chen, Xiaojun (Shenzhen University)
We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classification. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.
Predicting User Roles from Computer Logs Using Recurrent Neural Networks
Tuor, Aaron (Western Washington University ) | Kaplan, Samuel (Western Washington University) | Hutchinson, Brian (Western Washington University) | Nichols, Nicole (Pacific Northwest National Laboratory) | Robinson, Sean (Pacific Northwest National Laboratory)
Network and other computer administrators typically have access to a rich set of logs tracking actions by users. However, they often lack metadata such as user role, age, and gender that can provide valuable context for users' actions. Inferring user attributes automatically has wide ranging implications; among others, for customization (anticipating user needs and priorities), for managing resources (anticipating demand) and for security (interpreting anomalous behavior).
A Systematic Practice of Judging the Success of a Robotic Grasp Using Convolutional Neural Network
Liu, Hengshuang (Central China Normal University) | Ai, Pengcheng (Central China Normal University) | Chen, Junling (Central China Normal University)
In this abstract, we present a novel method using the deep convolutional neural network combined with traditional mechanical control techniques to solve the problem of determining whether a robotic grasp is successful or not. To finish the task, we construct a data acquisition platform capable of robot arm grasping and photo capturing, and collect a diversity of pictures by adjusting the shape and posture of the objects and controlling the robot arm to move randomly. For the purpose of validating the generalization capability, we adopt a stochastic sampling method based on cross validation to test our model. The experiment shows that, with an increasing number of shapes of objects involved in training, the network can identify new samples in a more accurate and steadier way. The accuracy rises from 89.2% when we use only one category of shape for training to above 99.7% when we use 17 categories for training.