Goto

Collaborating Authors

 Large Language Model


Visual Programming for Text to Image Generation and Evaluation

Neural Information Processing Systems

As large language models have demonstrated impressive performance in many domains, recent works have adopted language models (LMs) as controllers of visual modules for vision-and-language tasks. While existing work focuses on equipping LMs with visual understanding, we propose two novel interpretable/explainable visual programming frameworks for text-to-image (T2I) generation and evaluation. First, we introduce VPGEN, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation. We employ an LM to handle the first two steps (object/count generation and layout generation), by finetuning it on textlayout pairs. Our step-by-step T2I generation framework provides stronger spatial control than end-to-end models, the dominant approach for this task.


GIMLET: AUnified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning

Neural Information Processing Systems

Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules.


OpenAI's Sam Altman apologises over failure to report Canadian mass shooter

Al Jazeera

OpenAI's Sam Altman apologises over failure to report Canadian mass shooter OpenAI CEO Sam Altman has apologised over his company's failure to warn authorities about the concerning online activities of a teen who went on to commit one of Canada's worst mass shooting s. Jesse Van Rootselaar, 18, went on a shooting spree in Tumbler Ridge, British Columbia, on February 10, killing eight people. Rootselaar, who was born male but identified as female, died of a self-inflicted gunshot wound. OpenAI said after the attacks that Rootselaar's ChatGPT account had been flagged internally the previous June for misuse "in furtherance of violent activities", resulting in its suspension. The San Francisco-based AI company said at the time that it had not informed authorities, as Rootselaar's usage of the chatbot had not met the threshold of posing a credible or imminent threat of harm to others.


1289f9195d2ef8cfdfe5f50930c4a7c4-Supplemental-Conference.pdf

Neural Information Processing Systems

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on task-related texts improves the performance of fine-tuning (FT) in downstream tasks. Through experiments on eight single-sentence tasks and eight sentence-pair tasks in both semi-supervised and fully-supervised settings, we find that conventional continued pre-training does not consistently provide benefits and can even be detrimental for sentence-pair tasks or when prompt-based FT is used. To tackle these issues, we propose Prompt-based Continued Pre-training (PCP), which combines the idea of instruction tuning with conventional continued pre-training. Our approach aims to improve the performance of prompt-based FT by presenting both taskrelated texts and prompt templates to LMs through unsupervised pre-training objectives before fine-tuning for the target task. Our empirical evaluations on 21 benchmarks demonstrate that the PCP consistently improves the performance of state-of-the-art prompt-based FT approaches (up to 20.1% absolute) in both semisupervised and fully-supervised settings, even with only hundreds of unlabelled examples. Additionally, prompt-based FT with the PCP outperforms state-of-theart semi-supervised approaches with greater simplicity, eliminating the need for an iterative process and extra data augmentation. Our further analysis explores the performance lower bound of the PCP and reveals that the advantages of PCP persist across different sizes of models and datasets.




LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Neural Information Processing Systems

Recently, large-scale pre-trained Vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zeroshot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine-tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to 11.7% (3.8% on average) in the label-free setting. Moreover, despite our approach being label-free, we observe 1.3% average gains over leading few-shot prompting baselines that do use 5-shot supervision.


Appendices619

Neural Information Processing Systems

AAdditional Experiments620 Task 1 - Grouping In addition to grouping clue words using token embeddings (discussed in621 the main paper 4), we also ran grouping the words by clustering on'contextual' embeddings. We622 experimentally induce'context' by joining the sixteen (16) word tokens (in a random order) into a623 single pseudo-sentence. The embeddings for each token were different based on the ordering of the624 tokens. We repeat the random ordering sixteen times and report the mean and variance of the results625 obtained in Table 6.626 Mean standard deviation over 16 random seeds is shown. Task 2 - Connections In addition to prompting based results on GPT-4 (discussed in 4), we ran627 experiments on additional LLMs like LLaMa [67] (7B, 13B) using pre-trained configuration weights628 obtained by permission from Meta AI. However, without additional fine-tuning on the specific task,629 these LLMs were unable to solve the task in a meaningful manner.



NATURALPROVER: Grounded Mathematical Proof Generation with Language Models

Neural Information Processing Systems

Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study largescale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. We develop NATURALPROVER,a language model that generates proofs by conditioning on background references (e.g.