Large Language Model
Set up a text summarization project with Hugging Face Transformers: Part 1
When OpenAI released the third generation of their machine learning (ML) model that specializes in text generation in July 2020, I knew something was different. This model struck a nerve like no one that came before it. Suddenly I heard friends and colleagues, who might be interested in technology but usually don't care much about the latest advancements in the AI/ML space, talk about it. Even the Guardian wrote an article about it. Or, to be precise, the model wrote the article and the Guardian edited and published it. There was no denying it โ GPT-3 was a game changer.
'No code' brings the power of AI to the masses
Sean Cusack, a software engineer at Microsoft and beekeeper on the side, wanted to know if anything besides bees was going into his hives. So he built a tiny photo booth (a sort of bee vestibule) that took pictures whenever something appeared around it. But sorting through thousands of insect portraits proved tedious. Colleagues told him about a new product that the company was working on called Lobe.ai, which allows anybody to train a computer-vision system to recognize objects. Cusack used it to identify his honeybees -- but also to keep an eye out for the dreaded murder hornet.
5 AI Tools That Can Generate Code To Help Programmers
One of the most recent advancements in natural language processing (NLP) is the emergence of large language models (LLMs) that are built using vast datasets with enormous amounts of data. There are several LLMs that are available, such as Google's BERT and OpenAI's GPT-2 and GPT-3. With these models, it is possible to generate everything from simple essays to actual financial models with these models. AI startups including OpenAI, Hugging Face, Cohere, AI21 Labs are pushing the boundaries of LLM by training models with billions of parameters. OpenAI Codex is the model based on GPT-3 that powers GitHub Copilot - a tool from GitHub to generate code within mainstream development environments including VS Code, Neovim, JetBrains, and even in the cloud with GitHub Codespaces.
Language models that can search the web hold promise -- but also raise concerns
Did you miss a session at the Data Summit? Language models -- AI systems that can be prompted to write essays and emails, answer questions, and more -- remain flawed in many ways. Because they "learn" to write from examples on the web, including problematic social media posts, they're prone to generating misinformation, conspiracy theories, and racist, sexist, or otherwise toxic language. Another major limitation of many of today's language models is that they're "stuck in time," in a sense. Because they're trained once on a large collection of text from the web, their knowledge of the world -- which they gain from that collection -- can quickly become outdated depending on when they were deployed.
Nvidia takes the wraps off Hopper, its latest GPU architecture
Did you miss a session at the Data Summit? After much speculation, Nvidia today at its March 2022 GTC event announced the Hopper GPU architecture, a line of graphics cards that the company says will accelerate the types of algorithms commonly used in data science. Named for Grace Hopper, the pioneering U.S. computer scientist, the new architecture succeeds Nvidia's Ampere architecture, which launched roughly two years ago. The first card in the Hopper lineup is the H100, containing 80 billion transistors and a component called the Transformer Engine that's designed to speed up specific categories of AI models. Another architectural highlight includes Nvidia's MIG technology, which allows an H100 to be partitioned into seven smaller, isolated instances to handle different types of jobs.
GPT-3, Play Chess!
GPT-3 is a 175 billion parameter AI language model that has been trained on a large amount of data. In simple terms, a language model is an AI model that can predict the next set of words given a collection of input words (very much like the auto-complete feature in search engines). Large language models, such as GPT-3, take this a step further by being able to generate source codes or stories based just on a description or suggestion. The startup behind GPT-3, OpenAI, has made its model available to developers via an API. You may sign up for it here, and you'll get a credit of $18.
#1 AI Weekly Research News
Thank you so much for signing up for my AI Newsletter. In the last few days, we were doing deep research in AI-related research updates and we were able to find these cool ones. But before you start reading, please Join Our Subreddit so you don't miss any updates. Researchers from Carnegie Mellon University recently published a paper that compares existing code models -- Codex, GPT-J, GPT-Neo, GPT-NeoX, and CodeParrot -- across programming languages. By comparing and contrasting various models, they want to offer more light on the landscape of code modeling design decisions, as well as fill in a major gap: no big open-source language model has been trained purely on code from several programming languages.
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP
Esmaeilpour, Sepideh, Liu, Bing, Robertson, Eric, Shu, Lei
In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution(OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin.
An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels
Sorensen, Taylor, Robinson, Joshua, Rytting, Christopher Michael, Shaw, Alexander Glenn, Rogers, Kyle Jeffrey, Delorey, Alexia Pauline, Khalil, Mahmoud, Fulda, Nancy, Wingate, David
Pre-trained language models derive substantial linguistic and factual knowledge from the massive corpora on which they are trained, and prompt engineering seeks to align these models to specific tasks. Unfortunately, existing prompt engineering methods require significant amounts of labeled data, access to model parameters, or both. We introduce a new method for selecting prompt templates without labeled examples and without direct access to the model. Specifically, over a set of candidate templates, we choose the template that maximizes the mutual information between the input and the corresponding model output. Figure 1: Performance of template selected by our maximum Across 8 datasets representing 7 distinct NLP mutual information method (MI) compared to tasks, we show that when a template has high the the worst, mean, median, and best prompt on GPT-3 mutual information, it also has high accuracy Davinci (175B). Our method performs at almost oracle on the task. On the largest model, selecting levels, without labels or access to model weights.
THE AGE OF AI -- BOOK REVIEW
The book The Age of AI and Our Human Future is a graduate school level text. The Age of AI is the future, and it's coming way too fast. The human race has never been more challenged. We are all about to make some huge decisions. It is almost a magisterium for human life in the Fourth Industrial Revolution age. It is written by thought leaders of the highest-level, each in their respective fields. The first author is Henry Kissinger the former Secretary of State and NSC advisor to two US presidents, a philosopher and Nobel Peace Prize Laureate. At age 98 he has seen it all and done it, and remains an international counselor to politicians and business magnates. The second author, Eric Schmidt consolidated Google into the cutting edge technology giant that it is today. In this role he is a sought out counselor and business mogul. The third author is Daniel Huttenlocher -- the inaugural Dean of the MIT College of Computing. It is the place where AI is reinvented and recreated on self-teaching algorithm development and data aggregation from the global network platforms and the internet that occur 24/7 at a neck breaking pace. This compendium though incomplete, has more authors, contributors and editors. Meredith Potter is a contributor who augments Kissinger's intellectual pursuits she drafted, edited the texts and made the chapters flowing clearly and seamless. These and other editors made this textbook intellectually rich, informative, and easy to read. The Age of AI introduces the reader to the occurring changes we experienced in our society today. You are about to encounter many topics that involve the future in its continuing evolution. Every high school student is adapting to the new classroom intellectual reality. Here are two points to consider. First, the technology that this text discusses is not available in your community college courses or on other educational websites.