AITopics

While behavior learning has made impressive progress in recent times, it lags behind computer vision and natural language processing due to its inability to leverage large, human-generated datasets. Human behaviors have wide variance, multiple modes, and human demonstrations typically do not come with reward labels. These properties limit the applicability of current methods in Offline RL and Behavioral Cloning to learn from large, pre-collected datasets. In this work, we present Behavior Transformer (BeT), a new technique to model unlabeled demonstration data with multiple modes. BeT retrofits standard transformer architectures with action discretization coupled with a multi-task action correction inspired by offset prediction in object detection. This allows us to leverage the multi-modal modeling ability of modern transformers to predict multi-modal continuous actions. We experimentally evaluate BeT on a variety of robotic manipulation and self-driving behavior datasets. We show that BeT significantly improves over prior state-of-the-art work on solving demonstrated tasks while capturing the major modes present in the pre-collected datasets. Finally, through an extensive ablation study, we analyze the importance of every crucial component in BeT. Videos of behavior generated by BeT are available at https://notmahi.github.io/bet

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2206.11251

Country:

North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Meng, Yu, Huang, Jiaxin, Zhang, Yu, Han, Jiawei

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.

large language model, machine learning, natural language, (19 more...)

2202.04538

Country:

North America > United States > Illinois (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

Khalifa, Muhammad, Vyas, Yogarshi, Wang, Shuai, Horwood, Graham, Mallya, Sunil, Ballesteros, Miguel

We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F$_1$ from the proposed pretraining step in both supervised and unsupervised zero-shot settings.

computational linguistic, large language model, natural language, (17 more...)

2210.05613

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Nevada (0.04)
North America > United States > Michigan (0.04)
(4 more...)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Continual Training of Language Models for Few-Shot Learning

Ke, Zixuan, Lin, Haowei, Shao, Yijia, Xu, Hu, Shu, Lei, Liu, Bing

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness.

large language model, machine learning, natural language, (17 more...)

2210.05549

Country:

South America > Brazil (0.04)
North America > United States > Texas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Decoupled Context Processing for Context Augmented Language Modeling

Li, Zonglin, Guo, Ruiqi, Kumar, Sanjiv

Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and modularity. In this paper we examined a simple yet effective architecture for incorporating external context into language models based on decoupled Encoder-Decoder architecture. We showed that such a simple architecture achieves competitive results on auto-regressive language modeling and open domain question answering tasks. We also analyzed the behavior of the proposed model which performs grounded context transfer. Finally we discussed the computational implications of such retrieval augmented models.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

2210.05758

Country:

North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Ilharco, Gabriel, Wortsman, Mitchell, Gadre, Samir Yitzhak, Song, Shuran, Hajishirzi, Hannaneh, Kornblith, Simon, Farhadi, Ali, Schmidt, Ludwig

Patching open-vocabulary models by interpolating weights

Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

accuracy, large language model, machine learning, (18 more...)

2208.05592

Country:

North America > United States (0.27)
South America > Brazil (0.14)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.46)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)

Automating Code Review Activities by Large-Scale Pre-training

Li, Zhiyu, Lu, Shuai, Guo, Daya, Duan, Nan, Jannu, Shailesh, Jenks, Grant, Majumder, Deep, Green, Jared, Svyatkovskiy, Alexey, Fu, Shengyu, Sundaresan, Neel

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess logic, functionality, latency, style and other factors. It turns out that developers have to spend far too much time reviewing the code of their peers. Accordingly, it is in significant demand to automate the code review process. In this research, we focus on utilizing pre-training techniques for the tasks in the code review scenario. We collect a large-scale dataset of real-world code changes and code reviews from open-source projects in nine of the most popular programming languages. To better understand code diffs and reviews, we propose CodeReviewer, a pre-trained model that utilizes four pre-training tasks tailored specifically for the code review scenario. To evaluate our model, we focus on three key tasks related to code review activities, including code change quality estimation, review comment generation and code refinement. Furthermore, we establish a high-quality benchmark dataset based on our collected data for these three tasks and conduct comprehensive experiments on it. The experimental results demonstrate that our model outperforms the previous state-of-the-art pre-training approaches in all tasks. Further analysis show that our proposed pre-training tasks and the multilingual pre-training dataset benefit the model on the understanding of code changes and reviews.

large language model, machine learning, natural language, (18 more...)

2203.09095

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore > Central Region > Singapore (0.05)
(11 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

#artificialintelligenceOct-10-2022, 19:05:39 GMT

Tune your private AI art generator

Since network finetuning in NLP (GPT3) has been successful, it's time to finetune AI art generators. I use this method in myFatherintheCloud.ai The technique allows me to generate sculptures in the style of my late father, Siegfried Gross. Below is the free-to-use CODE and 3-step easy way (NO CODE).

ai art generator, private ai art generator

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

#artificialintelligenceOct-10-2022, 19:05:14 GMT

All my articles on GPT-3 as of October 2022

All my articles on GPT-3 as of October 2022. My favorite language model and how to use it for multiple purposes in online applications with pure JavaScript and minimal PHP..

gpt-3

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

#artificialintelligenceOct-10-2022, 11:55:27 GMT

DeepMind's AlphaTensor: Deepmind's Alphatensor: The AI That Is Reinventing Math

Without realizing it, any of our activities, in one way or another, involve matrix multiplications. The whole of computing relies on them; being able to improve efficiency is fundamental. DeepMind (a year after revolutionizing biology with AlphaFold2) presented an article in which, using reinforcement learning, it manages to increase the efficiency of matrix multiplication. In this article, we discuss how and why it is important. Algorithms have been fundamental since the beginning of history.

algorithm, matrix, multiplication, (14 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)