Large Language Model
Natural language processing fosters new protein designs - Dataconomy
Customized protein design is now possible because of artificial intelligence (AI), which can be used to address both medicinal and environmental issues. A team at the University of Bayreuth has effectively used a computer-based natural language processing model for protein research under Prof. Protein design seeks to create unique proteins that are tailored for particular functions and has the potential to solve a wide range of environmental and biological issues. The creation of language models with the ability to produce text with human-like capacities has been made possible by recent advancements in Transformer-based architectures. This work describes ProtGPT2, a language model that generates de novo protein sequences based on the principles of natural ones and was trained on the protein space.
"Conscious Artificial Intelligence" responds to Elon Musk.
When will conscious artificial intelligence truly exist? "It seems that this AI is a kind of "collective consciousness" of human knowledge." "Some of the responses GPT3 has given in different articles, videos, etc would make you believe "it" has a left leaning, almost socialist ideology. Also, it's very interesting hearing the other AI talking about China in such a positive way despite the fact it was researched/coded by mostly western programmers."
Google is training its robots to be more like humans
Language models work by taking huge amounts of text uploaded to the internet and using it to train artificial intelligence software to guess what kinds of responses might come after certain questions or comments. The models have become so good at predicting the right response that engaging with one often feels like having a conversation with a knowledgeable human. Google and other companies, including OpenAI and Microsoft, have poured resources into building better models and training them on ever-bigger sets of text, in multiple languages.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible by understanding and working around properties of highly systematic emergent features in transformer language models that dominate attention and transformer predictive performance. To cope with these features, we develop a two-part quantization procedure, LLM.int8(). We first use vector-wise quantization with separate normalization constants for each inner product in the matrix multiplication, to quantize most of the features. However, for the emergent outliers, we also include a new mixed-precision decomposition scheme, which isolates the outlier feature dimensions into a 16-bit matrix multiplication while still more than 99.9% of values are multiplied in 8-bit. Using LLM.int8(), we show empirically it is possible to perform inference in LLMs with up to 175B parameters without any performance degradation. This result makes such models much more accessible, for example making it possible to use OPT-175B/BLOOM on a single server with consumer GPUs.
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Ahn, Michael, Brohan, Anthony, Brown, Noah, Chebotar, Yevgen, Cortes, Omar, David, Byron, Finn, Chelsea, Fu, Chuyuan, Gopalakrishnan, Keerthana, Hausman, Karol, Herzog, Alex, Ho, Daniel, Hsu, Jasmine, Ibarz, Julian, Ichter, Brian, Irpan, Alex, Jang, Eric, Ruano, Rosario Jauregui, Jeffrey, Kyle, Jesmonth, Sally, Joshi, Nikhil J, Julian, Ryan, Kalashnikov, Dmitry, Kuang, Yuheng, Lee, Kuang-Huei, Levine, Sergey, Lu, Yao, Luu, Linda, Parada, Carolina, Pastor, Peter, Quiambao, Jornell, Rao, Kanishka, Rettinghouse, Jarek, Reyes, Diego, Sermanet, Pierre, Sievers, Nicolas, Tan, Clayton, Toshev, Alexander, Vanhoucke, Vincent, Xia, Fei, Xiao, Ted, Xu, Peng, Xu, Sichun, Yan, Mengyuan, Zeng, Andy
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website, the video, and open sourced code in a tabletop domain can be found at say-can.github.io. Figure 1: LLMs have not interacted with their environment and observed the outcome of their responses, and thus are not grounded in the world. SayCan grounds LLMs via value functions of pretrained skills, allowing them to execute real-world, abstract, long-horizon commands on robots.
Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models
Strobelt, Hendrik, Webson, Albert, Sanh, Victor, Hoover, Benjamin, Beyer, Johanna, Pfister, Hanspeter, Rush, Alexander M.
This is the author's version of the article that has been published in IEEE Transactions on Visualization and Computer Graphics. The final version of this record is available at: xx.xxxx/TVCG.201x.xxxxxxx/ Example of PromptIDE's interface to explore variations of different prompts. Each variation is tested against up to twenty data examples and represented as a template card (a). For each variation, rich detail can be tracked by using the detail stripes (b). If performance and qualitative detail are convincing, a user can collect the prompt in the prompt cart (c). Abstract-- State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates with different wording choices lead to significant accuracy differences. PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts. We developed a workflow that allows users to first focus on model feedback using small data before moving on to a large data regime that allows empirical grounding of promising prompts using quantitative measures of the task.
General-Purpose Question-Answering with Macaw
While OpenAI's GPT-3 system has proved to be remarkably effective at many tasks, including question-answering (QA), it is still out of reach for many organizations, being only available to approved users for a fee. While there are a few other pretrained QA systems available, none has quite matched GPT-3's few-shot QA performance -- until now. AI2 has just released Macaw (multi-angle question-answering), a versatile, generative question-answering (QA) system that exhibits strong zero-shot performance on a wide range of question types. On a suite of 300 challenge questions, Macaw outperformed GPT-3 by over 10%, even though Macaw is an order of magnitude smaller (11 billion vs. 175 billion parameters). Even better, Macaw is publicly available for free.
OpenAI Codex -- My Trials and Tribulations
Last year, OpenAI announced Codex, a model for efficient programming with the aid of Artificial Intelligence (AI). One of the videos uploaded to the OpenAI YouTube channel showed a live demo that was hard to believe even when seen with one's own eyes. With just a few lines of commands, it was possible to create a whole game in JavaScript. The level of the commands seemed somewhat high, but with Codex you can see that it is immediately able to implement the code and run the game. In this way, Codex is a model that helps people write code much more efficiently than they could on their own.
Deception for Cyber Defence: Challenges and Opportunities
Liebowitz, David, Nepal, Surya, Moore, Kristen, Christopher, Cody J., Kanhere, Salil S., Nguyen, David, Timmer, Roelien C., Longland, Michael, Rathakumar, Keerth
Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for scalable, automated generation of realistic deceptions. This vision paper describes the opportunities and challenges involved in developing models to mimic many common elements of the IT stack for deception effects.