Large Language Model
Local Knowledge Powered Conversational Agents
Santhanam, Sashank, Ping, Wei, Puri, Raul, Shoeybi, Mohammad, Patwary, Mostofa, Catanzaro, Bryan
State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models. However, even with these advancements, conversational agents still lack the ability to produce responses that are informative and coherent with the local context. In this work, we propose a dialog framework that incorporates both local knowledge as well as users' past dialogues to generate high quality conversations. We introduce an approach to build a dataset based on Reddit conversations, where outbound URL links are widely available in the conversations and the hyperlinked documents can be naturally included as local external knowledge. Using our framework and dataset, we demonstrate that incorporating local knowledge can largely improve informativeness, coherency and realisticness measures using human evaluations. In particular, our approach consistently outperforms the state-of-the-art conversational model on the Reddit dataset across all three measures. We also find that scaling the size of our models from 117M to 8.3B parameters yields consistent improvement of validation perplexity as well as human evaluated metrics. Our model with 8.3B parameters can generate human-like responses as rated by various human evaluations in a single-turn dialog setting.
Crazy GPT-3 Use Cases
GPT-3 has taken the world by storm. There are thousands of tweets about it with numerous mind-blowing use cases which you can see for yourself if you search Twitter for #gpt3 hashtag. In this text, I have taken a bunch of them to show general trends. In brief, GPT-3 allows humans to communicate with machines in Simple English. GPT-3 definitely will influence how we communicate with our devices and lower the level of technical sophistication one needs to build new applications.
Screening for Ethics at Scale
Last June OpenAI released the most powerful language model ever created, which became the topic of much discussion among developers, researchers, and entrepreneurs. Its capabilities of zero- and one-shot learning blew people's minds, with many GPT-3 powered applications going viral on twitter every second day. This API is being released in an era when polarization and bias have never been as intense, with technology that is powerful, scalable, and potentially dangerous -- imagine a fake news generator or a social media bullying bot powered by the human-like GPT-3. Understanding the harmful potential of its API technology, OpenAI has taken a unique Go To Market approach, strictly limiting access to a small number of vetted developers. By doing so, it became one of the first companies to voluntarily forfeit short-term profits in favor of being socially-responsible.
Give These Apps Some Notes and They'll Write Emails for You
Michael Shuffet didn't waste any keystrokes when responding to a message about the automated email writer he's building. He tapped out "Yes 45m" and clicked a button marked "Generate email." Shuffet checked it over and clicked Send. Compose is one of several automated writing tools built on striking new text-generation technology known as GPT-3, revealed in June by OpenAI, an artificial intelligence research institute. GPT-3 went viral this summer after people marveled at how it could fluently crank out memes, code, self-help blog posts, and Hemingway-style Harry Potter fanfic.
[P] tldrstory: Build AI-powered applications that understand headlines and story text
A zero-shot classifier, backed by a large general language model with no labeled data, is used to label data. Additionally, a txtai index enables ad hoc similarity searches against the data. Example application that uses the tldrstory framework to explore objectivity and bias in recent news headlines related to the 2020 US Presidential Election shown in video above.
What does GPT-3 mean for AI?
The biggest AI news of 2020 so far is the success of OpenAI's monstrous new language model, GPT-3. In this post, I'm going to quickly summarize why GPT-3 has caused such a splash, before highlighting 3 consequences for individuals and companies building things with AI. Why are people excited about GPT-3? There are already lots of summary posts about GPT-3, so I won't rehash them here. For a great introduction to how the model works, check out this visual guide from the (reliably excellent) Jay Alammar.
GPT3 and AGI: Beyond the Dichotomy - Part Two
Earlier this week, I spoke at an interesting online event organized by Khaleej times in the UAE (UAE's longest running daily English newspaper). This two-part blog is based on the talk. I addressed a hard topic – and one which I hope sparks some discussion. In part one – I lay the background of the discussion in more detail. Narrow AI - systems that can only perform one specific task.
Welcome To The Next Level Of Bullshit - Liwaiwai
GPT-3 is a marvel of engineering due to its breathtaking scale. It contains 175 billion parameters (the weights in the connections between the "neurons" or units of the network) distributed over 96 layers. It produces embeddings in a vector space with 12,288 dimensions. And it was trained on hundreds of billions of words representing a significant subset of the Internet--including the entirety of English Wikipedia, countless books, and a dizzying number of web pages. Training the final model alone is estimated to have cost around $5 million.
Artificial general intelligence: Are we close, and does it even make sense to try?
But Legg and Goertzel stayed in touch. When Goertzel was putting together a book of essays about superhuman AI a few years later, it was Legg who came up with the title. "I was talking to Ben and I was like, 'Well, if it's about the generality that AI systems don't yet have, we should just call it Artificial General Intelligence,'" says Legg, who is now DeepMind's chief scientist. "And AGI kind of has a ring to it as an acronym." Goertzel's book and the annual AGI Conference that he launched in 2008 have made AGI a common buzzword for human-like or superhuman AI.
NUIG-Shubhanker@Dravidian-CodeMix-FIRE2020: Sentiment Analysis of Code-Mixed Dravidian text using XLNet
Banerjee, Shubhanker, Jayapal, Arun, Thavareesan, Sajeetha
Social media has penetrated into multilingual societies, however most of them use English to be a preferred language for communication. So it looks natural for them to mix their cultural language with English during conversations resulting in abundance of multilingual data, call this code-mixed data, available in todays' world.Downstream NLP tasks using such data is challenging due to the semantic nature of it being spread across multiple languages.One such Natural Language Processing task is sentiment analysis, for this we use an auto-regressive XLNet model to perform sentiment analysis on code-mixed Tamil-English and Malayalam-English datasets.