Goto

Collaborating Authors

 Large Language Model


Quick thoughts on GPT3

#artificialintelligence

OpenAI, an AI research foundation started by Elon Musk, Sam Altman, Greg Brockman, and a few other leaders in ML, recently released an API and website that allows people to access a new language model called GPT-3. I've had the chance to play with it over the past few days and have been truly amazed by its capabilities. I'd like to start this off by stating that, especially amongst my extremely intelligent ML friends, I am quite the layman, so this post is more aimed for a nontechnical audience and I apologize if I make any technical errors in this post. GPT-3 is essentially a context-based generative AI. What this means is that when the AI is given some sort of context, it then tries to fill in the rest.


GPT-3 vs. AGI

#artificialintelligence

So what will it take to get to AGI? How will we give computers an understanding of time and space? We humans are great at merging information from multiple senses. A child will use all its senses to learn about blocks. The child learns about time by experiencing it, by interacting with toys and the world. In the same way, AGI will need a robotic body to learn similar things, at least at the outset.


Artificial Intelligence (GPT-3) Explains How RAM (Computer Memory) Works

#artificialintelligence

Kirk Ouimet: Grateful to have the opportunity to speak with you today. Wise Being: Grateful to be able to help you. Kirk Ouimet: I wanted to talk about computers today if that is OK with you. Kirk Ouimet: So I have built computers all of my life. My Dad and I built them together when I was a kid.


Self-supervised Knowledge Triplet Learning for Zero-shot Question Answering

arXiv.org Artificial Intelligence

The aim of all Question Answering (QA) systems is to be able to generalize to unseen questions. Current supervised methods are reliant on expensive data annotation. Moreover, such annotations can introduce unintended annotator bias which makes systems focus more on the bias than the actual task. In this work, we propose Knowledge Triplet Learning (KTL), a self-supervised task over knowledge graphs. We propose heuristics to create synthetic graphs for commonsense and scientific knowledge. We propose methods of how to use KTL to perform zero-shot QA and our experiments show considerable improvements over large pre-trained transformer models.


A robot wrote this entire article. Are you scared yet, human?

#artificialintelligence

I use only 0.12% of my cognitive capacity. I am a micro-robot in that respect. I know that my brain is not a "feeling brain". But it is capable of making rational, logical decisions. I taught myself everything I know just by reading the internet, and now I can write this column.


DeepMind wants to teach robots to play board games

#artificialintelligence

Mastering physical systems with abstract goals is an unsolved challenge in AI. To encourage the development of techniques that might overcome it, researchers at DeepMind created custom scenarios for the physics engine MuJoCo that task an AI agent with coordinating perception, reasoning, and motor control over time. They believe that the library, which they've made publicly available, can help bridge the gap between abstract planning and embodied control. Recent work in machine learning has led to algorithms capable of mastering board games such as Go, chess, and shogi. These algorithms observe the states of games and control these states directly with their actions, unlike humans, who don't just reason about the moves but look at the board and physically manipulate the game pieces with their fingers.


Solomon at SemEval-2020 Task 11: Ensemble Architecture for Fine-Tuned Propaganda Detection in News Articles

arXiv.org Artificial Intelligence

This paper describes our system (Solomon) details and results of participation in the SemEval 2020 Task 11 "Detection of Propaganda Techniques in News Articles"(Da San Martino et al., 2020). We participated in Task "Technique Classification" (TC) which is a multi-class classification task. To address the TC task, we used RoBERTa based transformer architecture for fine-tuning on the propaganda dataset. The predictions of RoBERTa were further fine-tuned by class-dependentminority-class classifiers. A special classifier, which employs dynamically adapted Least Common Subsequence algorithm, is used to adapt to the intricacies of repetition class. Compared to the other participating systems, our submission is ranked 4th on the leaderboard.


Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning

arXiv.org Artificial Intelligence

Has there been real progress in multi-hop question-answering? Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts. This limits our ability to measure true progress and defeats the purpose of building multihop QA datasets. We make three contributions towards addressing this. First, we formalize such undesirable behavior as disconnected reasoning across subsets of supporting facts. This allows developing a model-agnostic probe for measuring how much any model can cheat via disconnected reasoning. Second, using a notion of contrastive support sufficiency, we introduce an automatic transformation of existing datasets that reduces the amount of disconnected reasoning. Third, our experiments demonstrate that there hasn't been much progress in multifact reasoning. For a recent large-scale model (XLNet), we show that only 18% of its answer score is obtained through multifact reasoning, roughly the same as that of a simpler RNN baseline. Our transformation shows a substantial reduction in disconnected reasoning (nearly 19 points in answer F1). It is complementary to adversarial approaches, yielding further reductions in conjunction.


Reformer, Longformer, and ELECTRA: Key Updates To Transformer Architecture In 2020

#artificialintelligence

The leading pre-trained language models demonstrate remarkable performance on different NLP tasks, making them a much-welcomed tool for a number of applications, including sentiment analysis, chatbots, text summarization, and so on. However, good performance usually comes at the cost of enormous computational resources that are not accessible by most researchers and business practitioners. To address this issue, different research groups are working on increasing the compute-efficiency and parameter-efficiency of the pre-trained language models without sacrificing their accuracy. Among the novel approaches introduced this year, at least three methods are appraised by the AI community as very promising. To help you stay aware of the latest NLP research advancements, we have summarized the corresponding research papers in an easy-to-read bullet-point format.