Goto

Collaborating Authors

 Large Language Model


1 Game Dataset 2 Language Dataset Online Game Pro Game General Text Wiki Puzzle Book

Neural Information Processing Systems

When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in language model training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and language modeling by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess.


Results

Neural Information Processing Systems

In addition to CYCLIP described in 2, we train two more instantiations of it by keeping either of the two consistency regularizers active in the loss objective (Eq. The instantiation trained by setting ฮป1 = 0and ฮป2 = 0.5is termed as C-CYCLIP as only cross-modal consistency regularizer term is added to the loss objective. Similarly, we get I-CYCLIP where only in-modal consistency regularizer is added to the loss by setting ฮป1 = 0.5 and ฮป2 = 0. We evaluate C-CYCLIP and I-CYCLIP on most of the experiments discussed in the main text to understand their zero-shot transfer ability on standard datasets and robustness to natural distribution shifts. A.1 Zero-shot Transfer Table 7 presents our results of the zero-shot transfer experiment described in 3.1. We find that CYCLIP outperforms its sub-variants and the CLIP model on the ImageNet1K dataset.


AmadeusGPT: a natural language interface for interactive animal behavioral analysis

Neural Information Processing Systems

The process of quantifying and analyzing animal behavior involves translating the naturally occurring descriptive language of their actions into machine-readable code. Yet, codifying behavior analysis is often challenging without deep understanding of animal behavior and technical machine learning knowledge. To limit this gap, we introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code. Large-language models (LLMs) such as GPT3.5 and GPT4 allow for interactive language-based queries that are potentially well suited for making interactive behavior analysis. However, the comprehension capability of these LLMs is limited by the context window size, which prevents it from remembering distant conversations.


Visual Programming for Text to Image Generation and Evaluation

Neural Information Processing Systems

As large language models have demonstrated impressive performance in many domains, recent works have adopted language models (LMs) as controllers of visual modules for vision-and-language tasks. While existing work focuses on equipping LMs with visual understanding, we propose two novel interpretable/explainable visual programming frameworks for text-to-image (T2I) generation and evaluation. First, we introduce VPGEN, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation. We employ an LM to handle the first two steps (object/count generation and layout generation), by finetuning it on textlayout pairs. Our step-by-step T2I generation framework provides stronger spatial control than end-to-end models, the dominant approach for this task.


GIMLET: AUnified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning

Neural Information Processing Systems

Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules.


OpenAI's Sam Altman apologises over failure to report Canadian mass shooter

Al Jazeera

OpenAI's Sam Altman apologises over failure to report Canadian mass shooter OpenAI CEO Sam Altman has apologised over his company's failure to warn authorities about the concerning online activities of a teen who went on to commit one of Canada's worst mass shooting s. Jesse Van Rootselaar, 18, went on a shooting spree in Tumbler Ridge, British Columbia, on February 10, killing eight people. Rootselaar, who was born male but identified as female, died of a self-inflicted gunshot wound. OpenAI said after the attacks that Rootselaar's ChatGPT account had been flagged internally the previous June for misuse "in furtherance of violent activities", resulting in its suspension. The San Francisco-based AI company said at the time that it had not informed authorities, as Rootselaar's usage of the chatbot had not met the threshold of posing a credible or imminent threat of harm to others.


1289f9195d2ef8cfdfe5f50930c4a7c4-Supplemental-Conference.pdf

Neural Information Processing Systems

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on task-related texts improves the performance of fine-tuning (FT) in downstream tasks. Through experiments on eight single-sentence tasks and eight sentence-pair tasks in both semi-supervised and fully-supervised settings, we find that conventional continued pre-training does not consistently provide benefits and can even be detrimental for sentence-pair tasks or when prompt-based FT is used. To tackle these issues, we propose Prompt-based Continued Pre-training (PCP), which combines the idea of instruction tuning with conventional continued pre-training. Our approach aims to improve the performance of prompt-based FT by presenting both taskrelated texts and prompt templates to LMs through unsupervised pre-training objectives before fine-tuning for the target task. Our empirical evaluations on 21 benchmarks demonstrate that the PCP consistently improves the performance of state-of-the-art prompt-based FT approaches (up to 20.1% absolute) in both semisupervised and fully-supervised settings, even with only hundreds of unlabelled examples. Additionally, prompt-based FT with the PCP outperforms state-of-theart semi-supervised approaches with greater simplicity, eliminating the need for an iterative process and extra data augmentation. Our further analysis explores the performance lower bound of the PCP and reveals that the advantages of PCP persist across different sizes of models and datasets.




LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Neural Information Processing Systems

Recently, large-scale pre-trained Vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zeroshot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine-tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to 11.7% (3.8% on average) in the label-free setting. Moreover, despite our approach being label-free, we observe 1.3% average gains over leading few-shot prompting baselines that do use 5-shot supervision.