Goto

Collaborating Authors

 Large Language Model


Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

arXiv.org Artificial Intelligence

Transformer-based language models (LMs) create hidden representations of their inputs at every layer, but only use final-layer representations for prediction. This obscures the internal decision-making process of the model and the utility of its intermediate representations. One way to elucidate this is to cast the hidden representations as final representations, bypassing the transformer computation in-between. In this work, we suggest a simple method for such casting, by using linear transformations. We show that our approach produces more accurate approximations than the prevailing practice of inspecting hidden representations from all layers in the space of the final layer. Moreover, in the context of language modeling, our method allows "peeking" into early layer representations of GPT-2 and BERT, showing that often LMs already predict the final output in early layers. We then demonstrate the practicality of our method to recent early exit strategies, showing that when aiming, for example, at retention of 95% accuracy, our approach saves additional 7.9% layers for GPT-2 and 5.4% layers for BERT, on top of the savings of the original approach. Last, we extend our method to linearly approximate sub-modules, finding that attention is most tolerant to this change.


Leveraging Large Language Models for Multiple Choice Question Answering

arXiv.org Artificial Intelligence

While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective, the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated. Current state of the art (SOTA) methods on many multiple choice question answering (MCQA) tasks involve specialized models, extensive per-task engineering, and individualized tuning in general. What if one model could do just as well as each of these models does individually? This is part of a general vision for so-called foundation models (Bommasani et al., 2021). Foundation models include large pre-trained language models (LLMs) that have derived enough broad knowledge (spanning, for example, linguistic, factual, and commonsense (Liu et al., 2019; Amrami & Goldberg, 2018; Petroni et al., 2020; Bosselut et al.; Bouraoui et al.; Zuo et al., 2018; Bhagavatula et al., 2019)) to transfer from a simple language modelling objective to a huge array of natural language tasks. Interestingly, while LLMs have achieved SOTA results on many tasks, they generally fall short on MCQA. Why is this the case, given their general language modelling prowess as suggested by the low cross-entropy loss they attain with all their parameters, data, and compute (Kaplan et al., 2020; Henighan et al., 2020; Hernandez et al., 2021)?


A Prompt Log Analysis of Text-to-Image Generation Systems

arXiv.org Artificial Intelligence

Recent developments in large language models (LLM) and generative AI have unleashed the astonishing capabilities of text-to-image generation systems to synthesize high-quality images that are faithful to a given reference text, known as a "prompt". These systems have immediately received lots of attention from researchers, creators, and common users. Despite the plenty of efforts to improve the generative models, there is limited work on understanding the information needs of the users of these systems at scale. We conduct the first comprehensive analysis of large-scale prompt logs collected from multiple text-to-image generation systems. Our work is analogous to analyzing the query logs of Web search engines, a line of work that has made critical contributions to the glory of the Web search industry and research. Compared with Web search queries, text-to-image prompts are significantly longer, often organized into special structures that consist of the subject, form, and intent of the generation tasks and present unique categories of information needs. Users make more edits within creation sessions, which present remarkable exploratory patterns. There is also a considerable gap between the user-input prompts and the captions of the images included in the open training data of the generative models. Our findings provide concrete implications on how to improve text-to-image generation systems for creation purposes.


Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models

arXiv.org Artificial Intelligence

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt learning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize prompt learning with the visual knowledge and view images and the corresponding prompts as patch and token sets under optimal transport, which pushes the prompt tokens to faithfully capture the label-specific visual concepts, instead of overfitting the training categories. Moreover, the proposed model can also be straightforwardly extended to the conditional case where the instance-conditional prompts are generated to improve the generalizability. Extensive experiments on 15 datasets show promising transferability and generalization performance of our proposed model.


How well do Large Language Models perform in Arithmetic tasks?

arXiv.org Artificial Intelligence

Large language models have emerged abilities including chain-of-thought to answer math word problems step by step. Solving math word problems not only requires abilities to disassemble problems via chain-of-thought but also needs to calculate arithmetic expressions correctly for each step. To the best of our knowledge, there is no work to focus on evaluating the arithmetic ability of large language models. In this work, we propose an arithmetic dataset MATH 401 to test the latest large language models including GPT-4, ChatGPT, InstrctGPT, Galactica, and LLaMA with various arithmetic expressions and provide a detailed analysis of the ability of large language models. MATH 401 and evaluation codes are released at \url{https://github.com/GanjinZero/math401-llm}.


Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?

arXiv.org Artificial Intelligence

We evaluated the capability of generative pre-trained transformers (GPT), to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. Discussions of potential uses (e.g., exercise generation, code explanation) and misuses (e.g., cheating) of this emerging technology in programming education have intensified, but to date there has not been a rigorous analysis of the models' capabilities in the realistic context of a full-fledged programming course with diverse set of assessment instruments. We evaluated GPT on three Python courses that employ assessments ranging from simple multiple-choice questions (no code involved) to complex programming projects with code bases distributed into multiple files (599 exercises overall). Further, we studied if and how successfully GPT models leverage feedback provided by an auto-grader. We found that the current models are not capable of passing the full spectrum of assessments typically involved in a Python programming course (<70% on even entry-level modules). Yet, it is clear that a straightforward application of these easily accessible models could enable a learner to obtain a non-trivial portion of the overall available score (>55%) in introductory and intermediate courses alike. While the models exhibit remarkable capabilities, including correcting solutions based on auto-grader's feedback, some limitations exist (e.g., poor handling of exercises requiring complex chains of reasoning steps). These findings can be leveraged by instructors wishing to adapt their assessments so that GPT becomes a valuable assistant for a learner as opposed to an end-to-end solution.


Quora's Poe is launching subscriptions to let you chat with GPT-4-powered bot

#artificialintelligence

Yesterday, OpenAI unveiled its new GPT-4 model and competitor Anthropic unveiled its own ChatGPT competitor, Claude. Parallelly, Quora announced that its chatbot app Poe will now have a paid tier that will let you ask questions to bots powered by these models. Poe subscriptions will set you back $19.99 per month or $199.99 per year, and you can only buy it at the moment from your iOS or Apple Silicon-powered Mac. The company is working on making the paid plan available to purchase on the web. Quora first launched Poe last December as a closed beta and later opened it up to all iOS users last month.


OpenAI Debuts GPT-4 After Year of Training on Azure Supercomputer

#artificialintelligence

For search engines and enterprise writing assistance, the top contender is OpenAI, which yesterday announced the latest model of its language model, GPT-4. GPT-4 is now available on ChatGPT Plus and as an API, for which developers can join a waitlist. It's throwing a new weapon into the AI war, in which organizations jostle to provide the best, most flexible writing AI. OpenAI demonstrated the new natural language model with a challenge: "Explain the plot of Cinderella in a sentence where each word has to begin with the next letter in the alphabet from A to Z, without repeating any letters." It's a neat riddle to show the AI can perform some reasoning along with producing straightforward text, but what does it do in the office?


ChatGPT vs. Bing Chat: which is the best AI chatbot? - AIVAnet

#artificialintelligence

Bing Chat and ChatGPT are two of the latest natural language chatbots to become widely available, and both are competing for your attention and text prompts. Both AIs are based on similar language models, but there are some distinct differences between them, making the ChatGPT versus Bing Chat debate one well worth having. If you want to play around with these two exciting tools, here's everything you need to know to pick the right one for you. Both Bing Chat and ChatGPT are available for general use, but the way you access them is a little different. GPT-4 claims to be 40% better at producing'factual responses' Microsoft's Bing Chat: how to join the waitlist now ChatGPT is widely available and accessible through the main OpenAI website.


What Are The Downsides of AI Advancement? - KDnuggets

#artificialintelligence

The technologies behind artificial intelligence and machine learning keep getting better. From the first AI checkers and chess programs written in 1951 at the University of Manchester to OpenAI's ChatGPT and Google's Bard AI, the history and evolution of artificial intelligence is long and full of breakthroughs. Currently, AI is being used for many purposes across various industries. In the transportation industry, AI is now used in self-driving cars, auto-pilot software for autonomous flying, and software used to help drivers find the most efficient routes to avoid traffic and save time and fuel. In the healthcare industry, AI is now used by doctors to help keep track of symptoms and identify potential diagnoses and used by pharmaceutical scientists to design new drug therapies.