Goto

Collaborating Authors

Results


Knowledge Generation -- Variational Bayes on Knowledge Graphs

arXiv.org Artificial Intelligence

This thesis is a proof of concept for the potential of Variational Auto-Encoder (VAE) on representation learning of real-world Knowledge Graphs (KG). Inspired by successful approaches to the generation of molecular graphs, we evaluate the capabilities of our model, the Relational Graph Variational Auto-Encoder (RGVAE). The impact of the modular hyperparameter choices, encoding through graph convolutions, graph matching and latent space prior, is compared. The RGVAE is first evaluated on link prediction. The mean reciprocal rank (MRR) scores on the two datasets FB15K-237 and WN18RR are compared to the embedding-based model DistMult. A variational DistMult and a RGVAE without latent space prior constraint are implemented as control models. The results show that between different settings, the RGVAE with relaxed latent space, scores highest on both datasets, yet does not outperform the DistMult. Further, we investigate the latent space in a twofold experiment: first, linear interpolation between the latent representation of two triples, then the exploration of each latent dimension in a $95\%$ confidence interval. Both interpolations show that the RGVAE learns to reconstruct the adjacency matrix but fails to disentangle. For the last experiment we introduce a new validation method for the FB15K-237 data set. The relation type-constrains of generated triples are filtered and matched with entity types. The observed rate of valid generated triples is insignificantly higher than the random threshold. All generated and valid triples are unseen. A comparison between different latent space priors, using the $\delta$-VAE method, reveals a decoder collapse. Finally we analyze the limiting factors of our approach compared to molecule generation and propose solutions for the decoder collapse and successful representation learning of multi-relational KGs.


Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

arXiv.org Artificial Intelligence

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents -- or short passages -- in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms -- such as a person's name or a product model number -- not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections -- such as the document index of a commercial Web search engine -- containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.


A Distributional Approach to Controlled Text Generation

arXiv.org Artificial Intelligence

We propose a Distributional Approach to address Controlled Text Generation from pre-trained Language Models (LMs). This view permits to define, in a single formal framework, "pointwise" and "distributional" constraints over the target LM -- to our knowledge, this is the first approach with such generality -- while minimizing KL divergence with the initial LM distribution. The optimal target distribution is then uniquely determined as an explicit EBM (Energy-Based Model) representation. From that optimal representation we then train the target controlled autoregressive LM through an adaptive distributional variant of Policy Gradient. We conduct a first set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM (GPT-2). We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study we show the effectiveness of our adaptive technique for obtaining faster convergence.


Language Models are Open Knowledge Graphs

arXiv.org Artificial Intelligence

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pre-trained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.


Sensitive Information Detection: Recursive Neural Networks for Encoding Context

arXiv.org Machine Learning

The amount of data for processing and categorization grows at an ever increasing rate. At the same time the demand for collaboration and transparency in organizations, government and businesses, drives the release of data from internal repositories to the public or 3rd party domain. This in turn increase the potential of sharing sensitive information. The leak of sensitive information can potentially be very costly, both financially for organizations, but also for individuals. In this work we address the important problem of sensitive information detection. Specially we focus on detection in unstructured text documents. We show that simplistic, brittle rule sets for detecting sensitive information only find a small fraction of the actual sensitive information. Furthermore we show that previous state-of-the-art approaches have been implicitly tailored to such simplistic scenarios and thus fail to detect actual sensitive content. We develop a novel family of sensitive information detection approaches which only assumes access to labeled examples, rather than unrealistic assumptions such as access to a set of generating rules or descriptive topical seed words. Our approaches are inspired by the current state-of-the-art for paraphrase detection and we adapt deep learning approaches over recursive neural networks to the problem of sensitive information detection. We show that our context-based approaches significantly outperforms the family of previous state-of-the-art approaches for sensitive information detection, so-called keyword-based approaches, on real-world data and with human labeled examples of sensitive and non-sensitive documents.


GPT-3 Creative Fiction

#artificialintelligence

What if I told a story here, how would that story start?" Thus, the summarization prompt: "My second grader asked me what this passage means: …" When a given prompt isn't working and GPT-3 keeps pivoting into other modes of completion, that may mean that one hasn't constrained it enough by imitating a correct output, and one needs to go further; writing the first few words or sentence of the target output may be necessary.


AI Taking A Knee: Action To Improve Equal Treatment Under The Law

#artificialintelligence

In the wake of the George Floyd tragedy and so many other appalling cases like it, there is a growing question if a solution lies with robot police powered by artificial intelligence (AI.) In theory, AI cops could reduce biased and discriminatory practices and improve access to justice. Pop culture is filled with heroes like this such as Robocop and CHAPPiE. However, reality maybe a little stranger than fiction in this case as there are already some robots already in action for law enforcement. Let's start with Robo-Guard, which works in the South Korean prison system.


Improving Reliability of Latent Dirichlet Allocation by Assessing Its Stability Using Clustering Techniques on Replicated Runs

arXiv.org Artificial Intelligence

For organizing large text corpora topic modeling provides useful tools. A widely used method is Latent Dirichlet Allocation (LDA), a generative probabilistic model which models single texts in a collection of texts as mixtures of latent topics. The assignments of words to topics rely on initial values such that generally the outcome of LDA is not fully reproducible. In addition, the reassignment via Gibbs Sampling is based on conditional distributions, leading to different results in replicated runs on the same text data. This fact is often neglected in everyday practice. We aim to improve the reliability of LDA results. Therefore, we study the stability of LDA by comparing assignments from replicated runs. We propose to quantify the similarity of two generated topics by a modified Jaccard coefficient. Using such similarities, topics can be clustered. A new pruning algorithm for hierarchical clustering results based on the idea that two LDA runs create pairs of similar topics is proposed. This approach leads to the new measure S-CLOP ({\bf S}imilarity of multiple sets by {\bf C}lustering with {\bf LO}cal {\bf P}runing) for quantifying the stability of LDA models. We discuss some characteristics of this measure and illustrate it with an application to real data consisting of newspaper articles from \textit{USA Today}. Our results show that the measure S-CLOP is useful for assessing the stability of LDA models or any other topic modeling procedure that characterize its topics by word distributions. Based on the newly proposed measure for LDA stability, we propose a method to increase the reliability and hence to improve the reproducibility of empirical findings based on topic modeling. This increase in reliability is obtained by running the LDA several times and taking as prototype the most representative run, that is the LDA run with highest average similarity to all other runs.


2019 in Review: 10 AI Failures

#artificialintelligence

This is the third Synced year-end compilation of "Artificial Intelligence Failures." Despite AI's rapid growth and remarkable achievements, a review of AI failures remains necessary and meaningful. Our aim is not to downplay or mock research and development results, but rather to take a look at what went wrong with the hope we can do better next time. A leading facial-recognition system identified three-time Super Bowl champion Duron Harmon of the New England Patriots, Boston Bruins forward Brad Marchand, and 25 other New England professional athletes as criminals. Amazon's Rekognition software incorrectly matched the athletes to a database of mugshots in a test organized by the Massachusetts chapter of the American Civil Liberties Union (ACLU).


2019 in Review: 10 AI Failures

#artificialintelligence

This is the third Synced year-end compilation of "Artificial Intelligence Failures." Despite AI's rapid growth and remarkable achievements, a review of AI failures remains necessary and meaningful. Our aim is not to downplay or mock research and development results, but rather to take a look at what went wrong with the hope we can do better next time. A leading facial-recognition system identified three-time Super Bowl champion Duron Harmon of the New England Patriots, Boston Bruins forward Brad Marchand, and 25 other New England professional athletes as criminals. Amazon's Rekognition software incorrectly matched the athletes to a database of mugshots in a test organized by the Massachusetts chapter of the American Civil Liberties Union (ACLU).