This thesis is a proof of concept for the potential of Variational Auto-Encoder (VAE) on representation learning of real-world Knowledge Graphs (KG). Inspired by successful approaches to the generation of molecular graphs, we evaluate the capabilities of our model, the Relational Graph Variational Auto-Encoder (RGVAE). The impact of the modular hyperparameter choices, encoding through graph convolutions, graph matching and latent space prior, is compared. The RGVAE is first evaluated on link prediction. The mean reciprocal rank (MRR) scores on the two datasets FB15K-237 and WN18RR are compared to the embedding-based model DistMult. A variational DistMult and a RGVAE without latent space prior constraint are implemented as control models. The results show that between different settings, the RGVAE with relaxed latent space, scores highest on both datasets, yet does not outperform the DistMult. Further, we investigate the latent space in a twofold experiment: first, linear interpolation between the latent representation of two triples, then the exploration of each latent dimension in a $95\%$ confidence interval. Both interpolations show that the RGVAE learns to reconstruct the adjacency matrix but fails to disentangle. For the last experiment we introduce a new validation method for the FB15K-237 data set. The relation type-constrains of generated triples are filtered and matched with entity types. The observed rate of valid generated triples is insignificantly higher than the random threshold. All generated and valid triples are unseen. A comparison between different latent space priors, using the $\delta$-VAE method, reveals a decoder collapse. Finally we analyze the limiting factors of our approach compared to molecule generation and propose solutions for the decoder collapse and successful representation learning of multi-relational KGs.
We propose a Distributional Approach to address Controlled Text Generation from pre-trained Language Models (LMs). This view permits to define, in a single formal framework, "pointwise" and "distributional" constraints over the target LM -- to our knowledge, this is the first approach with such generality -- while minimizing KL divergence with the initial LM distribution. The optimal target distribution is then uniquely determined as an explicit EBM (Energy-Based Model) representation. From that optimal representation we then train the target controlled autoregressive LM through an adaptive distributional variant of Policy Gradient. We conduct a first set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM (GPT-2). We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study we show the effectiveness of our adaptive technique for obtaining faster convergence.
What if I told a story here, how would that story start?" Thus, the summarization prompt: "My second grader asked me what this passage means: …" When a given prompt isn't working and GPT-3 keeps pivoting into other modes of completion, that may mean that one hasn't constrained it enough by imitating a correct output, and one needs to go further; writing the first few words or sentence of the target output may be necessary.
The scope of a lucrative career promoted by Google through its video distribution platform YouTube has attracted a large number of users to become content creators. An important aspect of this line of work is the feedback received in the form of comments which show how well the content is being received by the audience. However, volume of comments coupled with spam and limited tools for comment classification makes it virtually impossible for a creator to go through each and every comment and gather constructive feedback. Automatic classification of comments is a challenge even for established classification models, since comments are often of variable lengths riddled with slang, symbols and abbreviations. This is a greater challenge where comments are multilingual as the messages are often rife with the respective vernacular. In this work, we have evaluated top-performing classification models for classifying comments which are a mix of different combinations of English and Malayalam (only English, only Malayalam and Mix of English and Malayalam). The statistical analysis of results indicates that Multinomial Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest and Decision Trees offer similar level of accuracy in comment classification. Further, we have also evaluated 3 multilingual transformer based language models (BERT, DISTILBERT and XLM) and compared their performance to the traditional machine learning classification techniques. XLM was the top-performing BERT model with an accuracy of 67.31. Random Forest with Term Frequency Vectorizer was the best performing model out of all the traditional classification models with an accuracy of 63.59.
We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions were incorporated into the acoustic and language model training sets. Results showed that the TDNN-F acoustic models benefit from the additional semi-supervised data and that even better performance could be achieved by including additional CNN layers. Using these CNN-TDNN-F acoustic models, a first iteration of semi-supervised training achieved an absolute mixed-language WER reduction of 3.4%, and a further 2.2% after a second iteration. Although the languages in the untranscribed data were unknown, the best results were obtained when all automatically transcribed data was used for training and not just the utterances classified as English-isiZulu. Despite reducing perplexity, the semi-supervised language model was not able to improve the ASR performance.
Alphabet is using its dominance in the search and advertising spaces -- and its massive size -- to find its next billion-dollar business. From healthcare to smart cities to banking, here are 10 industries the tech giant is targeting. With growing threats from its big tech peers Microsoft, Apple, and Amazon, Alphabet's drive to disrupt has become more urgent than ever before. The conglomerate is leveraging the power of its first moats -- search and advertising -- and its massive scale to find its next billion-dollar businesses. To protect its current profits and grow more broadly, Alphabet is edging its way into industries adjacent to the ones where it has already found success and entering new spaces entirely to find opportunities for disruption. Evidence of Alphabet's efforts is showing up in several major industries. For example, the company is using artificial intelligence to understand the causes of diseases like diabetes and cancer and how to treat them. Those learnings feed into community health projects that serve the public, and also help Alphabet's effort to build smart cities. Elsewhere, Alphabet is using its scale to build a better virtual assistant and own the consumer electronics software layer. It's also leveraging that scale to build a new kind of Google Pay-operated checking account. In this report, we examine how Alphabet and its subsidiaries are currently working to disrupt 10 major industries -- from electronics to healthcare to transportation to banking -- and what else might be on the horizon. Within the world of consumer electronics, Alphabet has already found dominance with one product: Android. Mobile operating system market share globally is controlled by the Linux-based OS that Google acquired in 2005 to fend off Microsoft and Windows Mobile. Today, however, Alphabet's consumer electronics strategy is being driven by its work in artificial intelligence. Google is building some of its own hardware under the Made by Google line -- including the Pixel smartphone, the Chromebook, and the Google Home -- but the company is doing more important work on hardware-agnostic software products like Google Assistant (which is even available on iOS).
Artificial neural networks have exceeded human-level performance in accomplishing several individual tasks (e.g. voice recognition, object recognition, and video games). However, such success remains modest compared to human intelligence that can learn and perform an unlimited number of tasks. Humans' ability of learning and accumulating knowledge over their lifetime is an essential aspect of their intelligence. Continual machine learning aims at a higher level of machine intelligence through providing the artificial agents with the ability to learn online from a non-stationary and never-ending stream of data. A key component of such a never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The work described in this thesis has been dedicated to the investigation of continual learning and solutions to mitigate the forgetting phenomena in neural networks. To approach the continual learning problem, we first assume a task incremental setting where tasks are received one at a time and data from previous tasks are not stored. Since the task incremental setting can't be assumed in all continual learning scenarios, we also study the more general online continual setting. We consider an infinite stream of data drawn from a non-stationary distribution with a supervisory or self-supervisory training signal. The proposed methods in this thesis have tackled important aspects of continual learning. They were evaluated on different benchmarks and over various learning sequences. Advances in the state of the art of continual learning have been shown and challenges for bringing continual learning into application were critically identified.
We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.
State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name "Concurrent Meta-Reinforcement Learning" (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task's state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks.