Goto

Collaborating Authors

 Generative AI


Artificial intelligence is becoming a 'force multiplier' -- for good and bad

#artificialintelligence

AI safety issues are becoming increasingly important. Google DeepMind and Faculty, both based in London, are devoting considerable resources to this area. But Anthropic, a San Francisco-based startup research company spun out of OpenAI, and some academic labs, including the Future of Humanity Institute in Oxford, are building expert teams in this field. "There is so little scrutiny over building very, very powerful software systems," says Hogarth. "We can plausibly have systems that exceed human capabilities in 30 years but there are fewer than 200 people in the world working on oversight and regulation."


The Evolution of Tokenization โ€“ Byte Pair Encoding in NLP - KDnuggets

#artificialintelligence

NLP may have been a little late to the AI epiphany but it is doing wonders with organisations like Google, OpenAI releasing state-of-the-art(SOTA) language models like BERT and GPT-2/3 respectively. GitHub Copilot and OpenAI codex are among a few very popular applications that are in the news. As someone who has very limited exposure to NLP, I decided to take up NLP as an area of research and the next few blogs/videos will be me sharing what I learn after dissecting some important components of NLP. Top Deep Learning models like BERT, GPT-2, or GPT-3 all share the same components but with different architectures that distinguish one model from another. In this newsletter(and notebook), we are going to focus on the basics of the first component of an NLP pipeline which is tokenization.


An OpenAI Model Learns to Summarize Books

#artificialintelligence

Many large tech companies are competing to develop general-purpose artificial intelligence (AI) -- allowing their model to approach and solve just about any problem we give them, no matter how time-consuming or challenging. This is referred to as the alignment problem. To test out and scale a potential solution, the OpenAI team recently trained an artificial intelligence model to recursively summarize books. Using natural language processing through GPT-3 can get you the gist of a book of any length. According to the OpenAI team, the model "achieves a 6/7 rating (similar to the average human-written summary) from humans who have read the book 5% of the time and a 5/7 rating 15% of the time."


The evolution of Tokenization in NLP -- Byte Pair Encoding in NLP

#artificialintelligence

NLP may have been a little late to the AI epiphany but it is doing wonders with organizations like Google, OpenAI releasing state-of-the-art(SOTA) language models like BERT and GPT-2/3 respectively. GitHub Copilot and OpenAI codex are among a few very popular applications that are in the news. As someone who has very limited exposure to NLP, I decided to take up NLP as an area of research and the next few blogs/videos will be me sharing what I learn after dissecting some important components of NLP. Top Deep Learning models like BERT, GPT-2, or GPT-3 all share the same components but with different architectures that distinguish one model from another. In this newsletter(and notebook), we are going to focus on the basics of the first component of an NLP pipeline which is tokenization.


AI Weekly: Researchers attempt an open source alternative to GitHub's Copilot

#artificialintelligence

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Let the OSS Enterprise newsletter guide your open source journey! In June, OpenAI teamed up with GitHub to launch Copilot, a service that provides suggestions for whole lines of code inside development environments like Microsoft Visual Studio. Powered by an AI model called Codex -- which OpenAI later exposed through an API -- Copilot can translate natural language into code across more than a dozen programming languages, interpreting commands in plain English and executing them. Now, a community effort is underway to create an open source, freely available alternative to Copilot and OpenAI's Codex model.


An Experimental Evaluation on Deepfake Detection using Deep Face Recognition

arXiv.org Artificial Intelligence

Significant advances in deep learning have obtained hallmark accuracy rates for various computer vision applications. However, advances in deep generative models have also led to the generation of very realistic fake content, also known as deepfakes, causing a threat to privacy, democracy, and national security. Most of the current deepfake detection methods are deemed as a binary classification problem in distinguishing authentic images or videos from fake ones using two-class convolutional neural networks (CNNs). These methods are based on detecting visual artifacts, temporal or color inconsistencies produced by deep generative models. However, these methods require a large amount of real and fake data for model training and their performance drops significantly in cross dataset evaluation with samples generated using advanced deepfake generation techniques. In this paper, we thoroughly evaluate the efficacy of deep face recognition in identifying deepfakes, using different loss functions and deepfake generation techniques. Experimental investigations on challenging Celeb-DF and FaceForensics++ deepfake datasets suggest the efficacy of deep face recognition in identifying deepfakes over two-class CNNs and the ocular modality. Reported results suggest a maximum Area Under Curve (AUC) of 0.98 and an Equal Error Rate (EER) of 7.1% in detecting deepfakes using face recognition on the Celeb-DF dataset. This EER is lower by 16.6% compared to the EER obtained for the two-class CNN and the ocular modality on the Celeb-DF dataset. Further on the FaceForensics++ dataset, an AUC of 0.99 and EER of 2.04% were obtained. The use of biometric facial recognition technology has the advantage of bypassing the need for a large amount of fake data for model training and obtaining better generalizability to evolving deepfake creation techniques.


New OpenAI API like Algolia, Quizlet, and Reddit

#artificialintelligence

Given any text prompt, the API will return a text completion, attempting to match the pattern you gave it. You can "program" it by showing it just a few examples of what you'd like it to do; its success generally varies depending on how complex the task is. The API also allows you to hone performance on specific tasks by training on a dataset (small or large) of examples you provide, or by learning from human feedback provided by users or labelers. We've designed the API to be both simple for anyone to use but also flexible enough to make machine learning teams more productive. In fact, many of our teams are now using the API so that they can focus on machine learning research rather than distributed systems problems.


How I built an AI Text-to-Art Generator

#artificialintelligence

This article is a write-up on how I built Text2Art.com in a week. Text2Art is an AI-powered art generator based on VQGAN CLIP that can generate all kinds of art such as pixel art, drawing, and painting from just text input. The article follows my thought process from experimenting with VQGAN CLIP, building a simple UI with Gradio, switching to FastAPI to serve the models, and finally to using Firebase as a queue system. Feel free to skip to the parts that you are interested in. If you like the project, you can vote for the project here.


OpenAI Unveils A Model Capable of Summarizing Books of Any Length

#artificialintelligence

OpenAI is an artificial intelligence research and development company with a mission to ensure that AI benefits all humanity. OpenAI has come with a new model to examine the alignment problem of machine learning. The interesting thing is that OpenAI's machine learning model summarizes books of any length by just summaries of each chapter to obtain a higher-level overview. The research has been conducted as an empirical study on scaling correspondence issues that can be tricky for AI algorithms. As they require complex input numbers or text that is not at all trained.


Breaking the OpenAI-Microsoft Monopoly

#artificialintelligence

OpenAI has become an extremely well-known AI company after the deserved popularity of GPT-3, their celebrity AI model. GPT-3 has amazing skills, it can compose poetry, write essays, or code, but none of that could've been possible without the help of Microsoft's money and computing power. GPT-3 is arguably the most advanced language model out there (at least among those that are publicly available). As such, it'd be reasonable to make it accessible for research purposes at universities and non-profit institutes. Instead, OpenAI decided they'd limit its access to a few privileged through a private API.