Large Language Model
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search
Galatolo, Federico A., Cimino, Mario G. C. A., Vaglini, Gigliola
In this research work we present GLaSS, a novel zero-shot framework to generate an image(or a caption) corresponding to a given caption(or image). GLaSS is based on the CLIP neural network which given an image and a descriptive caption provides similar embeddings. Differently, GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP embedding is most similar to the input one. This optimal image (or caption) is produced via a generative network after an exploration by a genetic algorithm. Promising results are shown, based on the experimentation of the image generators BigGAN and StyleGAN2, and of the text generator GPT2.
"Is depression related to cannabis?": A knowledge-infused model for Entity and Relation Extraction with Limited Supervision
Roy, Kaushik, Lokala, Usha, Khandelwal, Vedant, Sheth, Amit
With strong marketing advocacy of the benefits of cannabis use for improved mental health, cannabis legalization is a priority among legislators. However, preliminary scientific research does not conclusively associate cannabis with improved mental health. In this study, we explore the relationship between depression and consumption of cannabis in a targeted social media corpus involving personal use of cannabis with the intent to derive its potential mental health benefit. We use tweets that contain an association among three categories annotated by domain experts - Reason, Effect, and Addiction. The state-of-the-art Natural Langauge Processing techniques fall short in extracting these relationships between cannabis phrases and the depression indicators. We seek to address the limitation by using domain knowledge; specifically, the Drug Abuse Ontology for addiction augmented with Diagnostic and Statistical Manual of Mental Disorders lexicons for mental health. Because of the lack of annotations due to the limited availability of the domain experts' time, we use supervised contrastive learning in conjunction with GPT-3 trained on a vast corpus to achieve improved performance even with limited supervision. Experimental results show that our method can significantly extract cannabis-depression relationships better than the state-of-the-art relation extractor. High-quality annotations can be provided using a nearest neighbor approach using the learned representations that can be used by the scientific community to understand the association between cannabis and depression better.
Semantic Borrowing for Generalized Zero-Shot Learning
Generalized zero-shot learning (GZSL) is one of the most realistic problems, but also one of the most challenging problems due to the partiality of the classifier to supervised classes. Instance-borrowing methods and synthesizing methods solve this problem to some extent with the help of testing semantics, but therefore neither can be used under the class-inductive instance-inductive (CIII) training setting where testing data are not available, and the latter require the training process of a classifier after generating examples. In contrast, a novel method called Semantic Borrowing for improving GZSL methods with compatibility metric learning under CIII is proposed in this paper. It borrows similar semantics in the training set, so that the classifier can model the relationship between the semantics of zero-shot and supervised classes more accurately during training. In practice, the information of semantics of unseen or unknown classes would not be available for training while this approach does NOT need any information of semantics of unseen or unknown classes. The experimental results on representative GZSL benchmark datasets show that it can reduce the partiality of the classifier to supervised classes and improve the performance of generalized zero-shot classification.
Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks
Choi, Hyunjin, Kim, Judong, Joe, Seongho, Min, Seungjai, Gwon, Youngjune
In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training. A source of cross-lingual transfer can be as straightforward as lexical overlap between languages (e.g., use of the same scripts, shared subwords) that naturally forces text embeddings to occupy a similar representation space. Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks as the most important factor for the transfer. In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining. Particularly, we take XLM-RoBERTa (XLMR) in our experiments that extend semantic textual similarity (STS), SQuAD and KorQuAD for machine reading comprehension, sentiment analysis, and alignment of sentence embeddings under various cross-lingual settings. Our results indicate that the presence of cross-lingual transfer is most pronounced in STS, sentiment analysis the next, and MRC the last. That is, the complexity of a downstream task softens the degree of crosslingual transfer. All of our results are empirically observed and measured, and we make our code and data publicly available.
These virtual robot arms get smarter by training each other
A virtual robot arm has learned to solve a wide range of different puzzles--stacking blocks, setting the table, arranging chess pieces--without having to be retrained for each task. It did this by playing against a second robot arm that was trained to give it harder and harder challenges. Self play: Developed by researchers at OpenAI, the identical robot arms--Alice and Bob--learn by playing a game against each other in a simulation, without human input. The robots use reinforcement learning, a technique in which AIs are trained by trial and error what actions to take in different situations to achieve certain goals. The game involves moving objects around on a virtual tabletop.
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Pande, Madhura, Budhraja, Aakriti, Nema, Preksha, Kumar, Pratyush, Khapra, Mitesh M.
Multi-headed attention heads are a mainstay in transformer-based models. Different methods have been proposed to classify the role of each attention head based on the relations between tokens which have high pair-wise attention. These roles include syntactic (tokens with some syntactic relation), local (nearby tokens), block (tokens in the same sentence) and delimiter (the special [CLS], [SEP] tokens). There are two main challenges with existing methods for classification: (a) there are no standard scores across studies or across functional roles, and (b) these scores are often average quantities measured across sentences without capturing statistical significance. In this work, we formalize a simple yet effective score that generalizes to all the roles of attention heads and employs hypothesis testing on this score for robust inference. This provides us the right lens to systematically analyze attention heads and confidently comment on many commonly posed questions on analyzing the BERT model. In particular, we comment on the co-location of multiple functional roles in the same attention head, the distribution of attention heads across layers, and effect of fine-tuning for specific NLP tasks on these functional roles.
This Chinese Lab Is Aiming for Big AI Breakthroughs
In a low-rise building overlooking a busy intersection in Beijing, Ji Rong Wen, a middle-aged scientist with thin-rimmed glasses and a mop of black hair, excitedly describes a project that could advance one of the hottest areas of artificial intelligence. Wen leads a team at the Beijing Academy of Artificial Intelligence (BAAI), a government-sponsored research lab that's testing a powerful new language algorithm--something similar to GPT-3, a program revealed in June by researchers at OpenAI that digests large amounts of text and can generate remarkably coherent, free-flowing language. "This is a big project," Wen says with a big grin. "It takes a lot of computing infrastructure and money." Wen, a professor at Renmin University in Beijing recruited to work part-time at BAAI, hopes to create an algorithm that is even cleverer than GPT-3. He plans to combine machine learning with databases of facts, and to feed the algorithm images and video as well as text, in hope of creating a richer understanding of the physical world--that the words cat and fur don't just often appear in the same sentence, but are associated with one another visually.
Classifying Scientific Publications with BERT -- Is Self-Attention a Feature Selection Method?
Garcia-Silva, Andres, Gomez-Perez, Jose Manuel
We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles over a taxonomy of research disciplines. We observe how self-attention focuses on words that are highly related to the domain of the article. Particularly, a small subset of vocabulary words tends to receive most of the attention. We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification in order to characterize self-attention as a possible feature selection approach. Using ConceptNet as ground truth, we also find that attended words are more related to the research fields of the articles. However, conventional feature selection methods are still a better option to learn classifiers from scratch. This result suggests that, while self-attention identifies domain-relevant terms, the discriminatory information in BERT is encoded in the contextualized outputs and the classification layer. It also raises the question whether injecting feature selection methods in the self-attention mechanism could further optimize single sequence classification using transformers.
How GPT3 Works - Visualizations and Animations
Discussions: Hacker News (397 points, 97 comments), Reddit r/MachineLearning (247 points, 27 comments) Translations: German, Chinese (Simplified), Russian The tech world is abuzz with GPT3 hype. Massive language models (like GPT3) are starting to surprise us with their abilities. While not yet completely reliable for most businesses to put in front of their customers, these models are showing sparks of cleverness that are sure to accelerate the march of automation and the possibilities of intelligent computer systems. Let’s remove the aura of mystery around GPT3 and learn how it’s trained and how it works. A trained language model generates text. We can optionally pass it some text as input, which influences its output. The output is generated from what the model “learned” during its training period where it scanned vast amounts of text.