South America
On Understanding Knowledge Graph Representation
Allen, Carl, Balazevic, Ivana, Hospedales, Timothy M.
Many methods have been developed to represent knowledge graph data, which implicitly exploit low-rank latent structure in the data to encode known information and enable unknown facts to be inferred. To predict whether a relationship holds between entities, their embeddings are typically compared in the latent space following a relation-specific mapping. Whilst link prediction has steadily improved, the latent structure, and hence why such models capture semantic information, remains unexplained. We build on recent theoretical interpretation of word embeddings as a basis to consider an explicit structure for representations of relations between entities. For identifiable relation types, we are able to predict properties and justify the relative performance of leading knowledge graph representation methods, including their often overlooked ability to make independent predictions.
Mining Human Mobility Data to Discover Locations and Habits
Andrade, Thiago, Cancela, Brais, Gama, João
Many aspects of life are associated with places of human mobility patterns and nowadays we are facing an increase in the pervasiveness of mobile devices these individuals carry. Positioning technologies that serve these devices such as the cellular antenna (GSM networks), global navigation satellite systems (GPS), and more recently the WiFi positioning system (WPS) provide large amounts of spatio-temporal data in a continuous way. Therefore, detecting significant places and the frequency of movements between them is fundamental to understand human behavior. In this paper, we propose a method for discovering user habits without any a priori or external knowledge by introducing a density-based clustering for spatio-temporal data to identify meaningful places and by applying a Gaussian Mixture Model (GMM) over the set of meaningful places to identify the representations of individual habits. To evaluate the proposed method we use two real-world datasets. One dataset contains high-density GPS data and the other one contains GSM mobile phone data in a coarse representation. The results show that the proposed method is suitable for this task as many unique habits were identified. This can be used for understanding users' behavior and to draw their characterizing profiles having a panorama of the mobility patterns from the data.
Meta-Learning for Few-Shot Time Series Classification
Narwariya, Jyoti, Malhotra, Pankaj, Vig, Lovekesh, Shroff, Gautam, Tv, Vishnu
Deep neural networks (DNNs) have achieved state-of-the-art results on time series classification (TSC) tasks. In this work, we focus on leveraging DNNs in the often-encountered practical scenario where access to labeled training data is difficult, and where DNNs would be prone to overfitting. We leverage recent advancements in gradient-based meta-learning, and propose an approach to train a residual neural network with convolutional layers as a meta-learning agent for few-shot TSC. The network is trained on a diverse set of few-shot tasks sampled from various domains (e.g. healthcare, activity recognition, etc.) such that it can solve a target task from another domain using only a small number of training samples from the target task. Most existing meta-learning approaches are limited in practice as they assume a fixed number of target classes across tasks. We overcome this limitation in order to train a common agent across domains with each domain having different number of target classes, we utilize a triplet-loss based learning procedure that does not require any constraints to be enforced on the number of classes for the few-shot TSC tasks. To the best of our knowledge, we are the first to use meta-learning based pre-training for TSC. Our approach sets a new benchmark for few-shot TSC, outperforming several strong baselines on few-shot tasks sampled from 41 datasets in UCR TSC Archive. We observe that pre-training under the meta-learning paradigm allows the network to quickly adapt to new unseen tasks with small number of labeled instances.
Environment Sound Classification using Multiple Feature Channels and Deep Convolutional Neural Networks
Sharma, Jivitesh, Granmo, Ole-Christoffer, Goodwin, Morten
--In this paper, we propose a model for the Environment Sound Classification T ask (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN). The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. Also, we employ a deeper CNN (DCNN) compared to previous models, consisting of 2D separable convolutions working on time and feature domain separately. The model also consists of max pooling layers that downsample time and feature domain separately. We use some data augmentation techniques to further boost performance. Our model is able to achieve state-of- the-art performance on all three benchmark environment sound classification datasets, i.e. the UrbanSound8K (97.35%), T o the best of our knowledge, this is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets. For ESC-10 and ESC-50 datasets, the accuracy achieved by the proposed model is beyond human accuracy of 95.7% and 81.3% respectively. I NTRODUCTION T HERE are many important applications related to speech and audio processing. One of the most important application is the Environment Sound Classification (ESC) that deals with distinguishing between sounds from the real environment. It is a complex task that involves classifying a sound event into an appropriate class such as siren, dog barking, airplane, people talking etc. This task is quite different compared to Automatic Speech Recognition (ASR) [1], since environment sound features differ drastically from speech sounds. In ASR, speech is converted to text. However, in ESC, there is no such thing as speech, just sounds. So, ESC models are quite different compared to ASR models.
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic Training
Cuui, Yuqi, Xu, Yifan, Wu, Dongrui
Drowsy driving is pervasive, and also a major cause of traffic accidents. Estimating a driver's drowsiness level by monitoring the electroencephalogram (EEG) signal and taking preventative actions accordingly may improve driving safety. However, individual differences among different drivers make this task very challenging. A calibration session is usually required to collect some subject-specific data and tune the model parameters before applying it to a new subject, which is very inconvenient and not user-friendly. Many approaches have been proposed to reduce the calibration effort, but few can completely eliminate it. This paper proposes a novel approach, feature weighted episodic training (FWET), to completely eliminate the calibration requirement. It integrates two techniques: feature weighting to learn the importance of different features, and episodic training for domain generalization. Experiments on EEG-based driver drowsiness estimation demonstrated that both feature weighting and episodic training are effective, and their integration can further improve the generalization performance. FWET does not need any labelled or unlabelled calibration data from the new subject, and hence could be very useful in plug-and-play brain-computer interfaces.
Four education startups that keep you learning into adulthood
Today, education doesn't stop after students graduate; many continue learning throughout their life. And in addition to adult learning courses at colleges and universities, a lot of courses are now offered online – ranging from MOOCs ("massive open online course") to many apps. At the Global Education & Skills Forum in March, two of the 10 finalists were lifelong learning startups. Here are four of the most promising EdTech startups around. Don't have time to read a book a day?
Exascale Deep Learning for Scientific Inverse Problems
Laanait, Nouamane, Romero, Joshua, Yin, Junqi, Young, M. Todd, Treichler, Sean, Starchenko, Vitalii, Borisevich, Albina, Sergeev, Alex, Matheson, Michael
We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. Networks (DNN) models and data sets (Dai et al., 2019), the need for efficient distributed machine learning strategies on massively parallel systems is more significant than On small to moderate-scale systems, with 10's - 100's of GPU/TPU accelerators, these scaling inefficiencies can be difficult to detect and systematically optimize due to system noise and load variability. The scaling inefficiencies of data-parallel implementations are most readily apparent on large-scale systems such as supercomputers with 1,000's-10,000's of accelerators. Extending data-parallelism to the massive scale of super-computing systems is also motivated by the latter's traditional workload consisting of scientific numerical simulations (Kent & Kotliar, 2018). NVLink interconnect, supporting a (peak) bidirectional bandwidth of 100 GB/s, where each 3 V100 GPUs are grouped in a ring topology with all-to-all connections to a POWER9 CPU.
Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning
Doan, Thang, Mazoure, Bogdan, Durand, Audrey, Pineau, Joelle, Hjelm, R Devon
Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).
Better Language Models and Their Implications
We've trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization--all without task-specific training. Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data. GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data.
Healthcare Artificial Intelligence Market Opportunity Analysis, Vendor Landscape, Growth, Developments & Forecast 2019-2025, DEEP GENOMICS, Next IT Corp., General Vision, Google, NVIDIA Corporation, IBM Watson Health – Market Expert24
As the application of artificial intelligence (AI) in the field of drug development increases, market growth is greatly favored. Artificial intelligence (AI) is called engineering and science adopted to design intelligent machines, such as intelligent computer programs. A system that applies multiple human intelligence-based functions, such as learning, reasoning, and problem-solving skills in areas such as computer science, biology, linguistics, mathematics, and engineering. Artificial intelligence is regarded as the next boundary of medical innovation. Healthcare's AI is implemented to align structured and unstructured data.