Goto

Collaborating Authors

 South America


Word Embeddings: A Survey

arXiv.org Machine Learning

This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.


Drones Drop Poison Bombs to Fight One Island's Rat Invasion

WIRED

I get the feeling you don't dislike rats enough. Because your struggles with the rodents chewing through your house pale in comparison to the problems wrought by rodents chewing through entire island ecosystems. Release just one pregnant rat on an island and soon enough the invasive predators will have decimated that pristine environment like an atom bomb. Sure, rats on their own are pretty neat, but we've got a nasty habit of transporting them where they don't belong, at which point they transform into menaces. Such is the plight of the Galapagos Island of Seymour Norte, a speck of 455 acres off the coast of Ecuador. In 2007, conservationists succeeded in ridding the island of invasive rats, but a decade later, the fiends had returned, likely by swimming from the neighboring island of Baltra.


Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

arXiv.org Machine Learning

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information. Neural-linear bandits leverage the representation power of deep neural networks and combine it with efficient exploration mechanisms, designed for linear contextual bandits, on top of the last hidden layer. Since the representation is being optimized during learning, information regarding exploration with "old" features is lost. Here, we propose the first limited memory neural-linear bandit that is resilient to this phenomenon, which we term catastrophic forgetting. We evaluate our method on a variety of real-world data sets, including regression, classification, and sentiment analysis, and observe that our algorithm is resilient to catastrophic forgetting and achieves superior performance.


Graph heat mixture model learning

arXiv.org Machine Learning

Graph inference methods have recently attracted a great interest from the scientific community, due to the large value they bring in data interpretation and analysis. However, most of the available state-of-the-art methods focus on scenarios where all available data can be explained through the same graph, or groups corresponding to each graph are known a priori. In this paper, we argue that this is not always realistic and we introduce a generative model for mixed signals following a heat diffusion process on multiple graphs. We propose an expectation-maximisation algorithm that can successfully separate signals into corresponding groups, and infer multiple graphs that govern their behaviour. We demonstrate the benefits of our method on both synthetic and real data.


General Supervision via Probabilistic Transformations

arXiv.org Machine Learning

Different types of training data have led to numerous schemes for supervised classification. Current learning techniques are tailored to one specific scheme and cannot handle general ensembles of training data. This paper presents a unifying framework for supervised classification with general ensembles of training data, and proposes the learning methodology of generalized robust risk minimization (GRRM). The paper shows how current and novel supervision schemes can be addressed under the proposed framework by representing the relationship between examples at test and training via probabilistic transformations. The results show that GRRM can handle different types of training data in a unified manner, and enable new supervision schemes that aggregate general ensembles of training data.


Location reference identification from tweets during emergencies: A deep learning approach

arXiv.org Machine Learning

Twitter is recently being used during crises to communicate with officials and provide rescue and relief operation in real time. The geographical location information of the event, as well as users, are vitally important in such scenarios. The identification of geographic location is one of the challenging tasks as the location information fields, such as user location and place name of tweets are not reliable. The extraction of location information from tweet text is difficult as it contains a lot of nonstandard English, grammatical errors, spelling mistakes, nonstandard abbreviations, and so on. This research aims to extract location words used in the tweet using a Convolutional Neural Network (CNN) based model. We achieved the exact matching score of 0.929, Hamming loss of 0.002, and F Our model was able to extract even three-to four-word long location references which is also evident from the exact matching score of over 92%. The findings of this paper can help in early event localization, emergency situations, real-time road traffic management, localized advertisement, and in various location-based services. Keywords: Location references, Tweets, Geo-locations, Named entity recognition, Gazetteer, Convolutional Neural Network 1. Introduction Tweets are very responsive to real-world events, and are sometimes even more immediate than traditional news channels. Therefore, it is possible to keep track of the latest information by following tweets. Several examples were seen when the news was first reported on Twitter, such as an airplane crash over the Hudson River in New York in the year 2009 (Sakaki et al., 2013), the death of former British Prime Minister Margaret Thatcher in April 2013 Preprint submitted to Elsevier January 25, 2019 Sakaki et al., 2013; Singh et al., 2017; Yuan & Liu, 2018). In an American Red Cross survey, a question was asked to individuals that "whom they contacted in an emergency?" The estimation and detection of location information of events and users from tweets are a major concern in relation to the above-mentioned tasks. Twitter provides three location information fields for sharing a user's location: (1) User location; (2) Place name; and (3) Geo-coordinate. The user location field has 140 character spaces (previously it was limited to 30 characters) in which the user can write his/her home location information while creating their profile. This field is optional to the user and the user can write any arbitrary words or leave it blank.


Deep Learning for Anomaly Detection: A Survey

arXiv.org Machine Learning

Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is twofold, firstly we present a structured and comprehensive overviewof research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art deep anomaly detection research techniques into different categories based on the underlying assumptions and approach adopted. Within each category, we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. Besides, for each category, we also present the advantages and limitations and discuss the computational complexity of the techniques inreal application domains. Finally, we outline open issues in research and challenges faced while adopting deep anomaly detection techniques for real-world problems.


ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU

arXiv.org Machine Learning

To improve the performance of Intensive Care Units (ICUs), the field of bio-statistics has developed scores which try to predict the likelihood of negative outcomes. These help evaluate the effectiveness of treatments and clinical practice, and also help to identify patients with unexpected outcomes. However, they have been shown by several studies to offer sub-optimal performance. Alternatively, Deep Learning offers state of the art capabilities in certain prediction tasks and research suggests deep neural networks are able to outperform traditional techniques. Nevertheless, a main impediment for the adoption of Deep Learning in healthcare is its reduced interpretability, for in this field it is crucial to gain insight on the why of predictions, to assure that models are actually learning relevant features instead of spurious correlations. To address this, we propose a deep multi-scale convolutional architecture trained on the Medical Information Mart for Intensive Care III (MIMIC-III) for mortality prediction, and the use of concepts from coalitional game theory to construct visual explanations aimed to show how important these inputs are deemed by the network. Our results show our model attains state of the art performance while remaining interpretable. Supporting code can be found at https://github.com/williamcaicedo/ISeeU.


CTCModel: a Keras Model for Connectionist Temporal Classification

arXiv.org Machine Learning

We report an extension of a Keras Model, called CTCModel, to perform the Connectionist Temporal Classification (CTC) in a transparent way. Combined with Recurrent Neural Networks, the Connectionist Temporal Classification is the reference method for dealing with unsegmented input sequences, i.e. with data that are a couple of observation and label sequences where each label is related to a subset of observation frames. CTCModel makes use of the CTC implementation in the Tensorflow backend for training and predicting sequences of labels using Keras. It consists of three branches made of Keras models: one for training, computing the CTC loss function; one for predicting, providing sequences of labels; and one for evaluating that returns standard metrics for analyzing sequences of predictions.


Stochastic Gradient Trees

arXiv.org Machine Learning

We present an online algorithm that induces decision trees using gradient information as the source of supervision. In contrast to previous approaches to gradient-based tree learning, we do not require soft splits or construction of a new tree for every update. In experiments, our method performs comparably to standard incremental classification trees and outperforms state of the art incremental regression trees. We also show how the method can be used to construct a novel type of neural network layer suited to learning representations from tabular data and find that it increases accuracy of multiclass and multi-label classification.