Large Language Model
Open-source language AI challenges big tech's models
Researchers have warned against possible harms from AI that processes and generates text.Credit: Getty An international team of around 1,000 largely academic volunteers has tried to break big tech's stranglehold on natural-language processing and reduce its harms. Trained with US$7-million-worth of publicly funded computing time, the BLOOM language model will rival in scale those made by firms Google and OpenAI, but will be open-source. BLOOM will also be the first model of its scale to be multilingual. The collaboration, called BigScience, launched an early version of the model on 17 June, and hopes that it will ultimately help to reduce harmful outputs of artificial intelligence (AI) language systems. Models that recognize and generate language are increasingly used by big tech firms in applications from chat bots to translators, and can sound so eerily human that a Google engineer this month claimed that the firm's AI model was sentient (Google strongly denies that the AI possesses sentience).
#2,668 โ AI Week: Bloom
A collaborative effort has led to the creation of a new open source AI language model that anyone can use democratizing access to AI. "Unlike other, more famous large language models such as OpenAI's GPT-3 and Google's LaMDA, BLOOM (which stands for BigScience Large Open-science Open-access Multilingual Language Model) is designed to be as transparent as possible, with researchers sharing details about the data it was trained on, the challenges in its development, and the way they evaluated its performance. OpenAI and Google have not shared their code or made their models available to the public, and external researchers have very little understanding of how these models are trained. BLOOM was created over the last year by over 1,000 volunteer researchers in a project called BigScience, which was coordinated by AI startup Hugging Face using funding from the French government. It officially launched on July 12. The researchers hope developing an open-access LLM that performs as well as other leading models will lead to long-lasting changes in the culture of AI development and help democratize access to cutting-edge AI technology for researchers around the world. The model's ease of access is its biggest selling point. Now that it's live, anyone can download it and tinker with it free of charge on Hugging Face's website. Users can pick from a selection of languages and then type in requests for BLOOM to do tasks like writing recipes or poems, translating or summarizing texts, or writing programming code. AI developers can use the model as a foundation to build their own applications."
DL/ML Trends 2022
Here's the list of things that I think will be hot in the filed of Machine Learning and Deep Learning in the near future: Self-supervised learning is the machine learning technique that can be used to train models for tasks in NLP, Computer Vision, Reinforcement Learning and robotics. This method works by getting small amount of labeled data, learning common patterns from it and then by using these representations work large amounts of unlabeled data. LLMs like BigScience's BLOOM or OpenAI GPT-3 are getting better and better and therefore they become able to some rich varieties of tasks such as SQL code generation, Image Captioning or Essay writing and many more awesome things! I think it's one of the most ambitious as well as exciting fields in all AI/Machine Learning Research. Vision Transformers (ViT)are dep neural networks that utilize transformer architecture to deal with computer vision tasks.
Meta AI declares war on Google.
Machine Learning researchers at Meta have released a new Large Language Model (LLM) called Sphere. With its amazing performance on search-related tasks, and ability to parse through billions of documents, combined with Meta's other work into NLP In this article, Meta has positioned itself well to disrupt the search market. I will cover the technology behind this architecture itself. I will do another article on the implications behind Meta open-sourcing everything about their model, later down the line. That requires its own attention.
Artificial Intelligence Predicts The Structure of Almost Every Protein Ever Found
A new era of biological research has been unlocked, with an artificial intelligence (AI) predicting the 3D shape of nearly every protein known to science โ just one year after its first data release. Thanks to AlphaFold, an AI tool developed by the Google-owned AI company DeepMind, more than 200 million protein structures have now been shared online in a free-to-access, searchable database, called AlphaFold DB. The accomplishment paves the way for untold avenues of scientific exploration into proteins, the building blocks of life. And researchers are giddy with excitement. "Determining the 3D structure of a protein used to take many months or years, it now takes seconds," cardiologist Eric Topol from the Scripps Research Translational Institute explained in a statement about the data release.
Democratizing the hardware side of large language models
There's growing concern that artificial intelligence--namely deep learning--is becoming centralized within a few very wealthy companies. This shift does not apply to all areas of AI, but it is certainly the case for large language models, deep learning systems composed of billions of parameters and trained on terabytes of text data. Accordingly, there has been growing interest in democratizing LLMs and making them available to a broader audience. However, while there have been impressive initiatives in open-sourcing models, the hardware barriers of large language models have gone mostly unaddressed. This is one of the problems that Cerebras, a startup that specializes in AI hardware, aims to solve with its Wafer Scale processor.
How to transfer the knowledge from GPT3 to small private models
A bigger teacher model gives information to a smaller student model. Extracted Understanding can be better, sometimes even more excellent than the Details chosen by humans. What is the distillation of knowledge? Knowledge distillation is the process of moving information from a big model to a single, more manageable model that may be used in real-world applications. In essence, it is a kind of model compression.
Pre-training, fine-tuning and in-context learning in Large Language Models (LLMs)
Since the advent of Transformers in 2017, Large Language Models (LLMs) have completely changed the process of training ML models for language tasks. Earlier, for a given task and a given dataset, we used to play around with various models like RNNs, LSTMs, Decision Trees, etc by training each of them on a subset of the data and testing on the rest. And whichever model gave the best accuracy was chosen as the winner. Of course, a lot of model hyper-parameters also needed to be tuned and experimented with. And for many problems, feature engineering was also necessary.
Meta is putting its latest AI chatbot on the web for the public to talk to
Meta's AI research labs have created a new state-of-the-art chatbot and are letting members of the public talk to the system in order to collect feedback on its capabilities. The bot is called BlenderBot 3 and can be accessed on the web. BlenderBot 3 is able to engage in general chitchat, says Meta, but also answer the sort of queries you might ask a digital assistant, "from talking about healthy food recipes to finding child-friendly amenities in the city." The bot is a prototype and built on Meta's previous work with what are known as large language models or LLMS -- powerful but flawed text-generation software of which OpenAI's GPT-3 is the most widely known example. Like all LLMs, BlenderBot is initially trained on vast datasets of text, which it mines for statistical patterns in order to generate language.