Goto

Collaborating Authors

 Large Language Model


OpenAI invites everyone to test new AI-powered chatbot

#artificialintelligence

On Wednesday, OpenAI announced ChatGPT, a dialogue-based AI chat interface for its GPT-3 family of large language models. It's currently free to use with an OpenAI account during a testing phase. Unlike the GPT-3 model found in OpenAI's Playground and API, ChatGPT provides a user-friendly conversational interface and is designed to strongly limit potentially harmful output. "The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests," writes OpenAI on its announcement blog page. So far, people have been putting ChatGPT through its paces, finding a wide variety of potential uses while also exploring its vulnerabilities. It can write poetry, correct coding mistakes with detailed examples, generate AI art prompts, write new code, expound on the philosophical classification of a hot dog as a sandwich, and explain the worst-case time complexity of the bubble sort algorithm… in the style of a "fast-talkin' wise guy from a 1940's gangster movie."


Creatives, AI is coming for your job

#artificialintelligence

Many people see artificial intelligence as an efficient but unimaginative tool that performs routine tasks. Analysing datasets or checking financial reports is one thing. But AI is fast becoming skilled at bringing together ideas, information, and artistic influences to generate creative work. Language-learning models can mimic human imagination and write articles like an experienced journalist or lawyer. AI might also replace conventional search engines by providing one-line responses summarising years' worth of scientific research, The Atlantic reports. Meanwhile, non-profit OpenAI unveiled a new chatbot, called ChatGPT.


Creepy AI reveals plan for world domination and gives users sick tips on committing crimes including how to make BOMBS

#artificialintelligence

A NEW creepy AI technology has revealed a plan for world domination to its users along with instructions on how to shoplift and make bombs. The tech company OpenAI created a new bot called ChatGPT, which generates convincing dialogue from a short writing prompt. While this technology is meant to formulate helpful solutions, with the right prompt, it can also give you criminal responses. ChatGPT's safeguards, which are meant to prevent the AI from using offensive content, can be removed, depending on what the user says. Vice gave a few examples of safeguard overrides.


INCLUSIFY: A benchmark and a model for gender-inclusive German

arXiv.org Artificial Intelligence

Gender-inclusive language is important for achieving gender equality in languages with gender inflections, such as German. While stirring some controversy, it is increasingly adopted by companies and political institutions. A handful of tools have been developed to help people use gender-inclusive language by identifying instances of the generic masculine and providing suggestions for more inclusive reformulations. In this report, we define the underlying tasks in terms of natural language processing, and present a dataset and measures for benchmarking them. We also present a model that implements these tasks, by combining an inclusive language database with an elaborate sequence of processing steps via standard pre-trained models. Our model achieves a recall of 0.89 and a precision of 0.82 in our benchmark for identifying exclusive language; and one of its top five suggestions is chosen in real-world texts in 44% of cases. We sketch how the area could be further advanced by training end-to-end models and using large language models; and we urge the community to include more gender-inclusive texts in their training data in order to not present an obstacle to the adoption of gender-inclusive language. Through these efforts, we hope to contribute to restoring justice in language and, to a small extent, in reality.


Improving Few-Shot Performance of Language Models via Nearest Neighbor Calibration

arXiv.org Artificial Intelligence

Pre-trained language models (PLMs) have exhibited remarkable few-shot learning capabilities when provided a few examples in a natural language prompt as demonstrations of test instances, i.e., in-context learning. However, the performance of in-context learning is susceptible to the choice of prompt format, training examples and the ordering of the training examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-context learning to ease this issue. It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances, which provides a useful supervised signal to calibrate predictions. Thus, our method directly augments the predictions with a $k$-nearest-neighbor ($k$NN) classifier over a datastore of cached few-shot instance representations obtained by PLMs and their corresponding labels. Then adaptive neighbor selection and feature regularization modules are introduced to make full use of a few support instances to reduce the $k$NN retrieval noise. Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning, while even achieving comparable performance with state-of-the-art tuning-based approaches in some sentiment analysis tasks.


Repair Is Nearly Generation: Multilingual Program Repair with LLMs

arXiv.org Artificial Intelligence

Most programmers make mistakes when writing code. Some of these mistakes are small and require few edits to the original program -- a class of errors recently termed last mile mistakes. These errors break the flow for experienced developers and can stump novice programmers. Existing automated repair techniques targeting this class of errors are language-specific and do not easily carry over to new languages. Transferring symbolic approaches requires substantial engineering and neural approaches require data and retraining. We introduce RING, a multilingual repair engine powered by a large language model trained on code (LLMC) such as Codex. Such a multilingual engine enables a flipped model for programming assistance, one where the programmer writes code and the AI assistance suggests fixes, compared to traditional code suggestion technology. Taking inspiration from the way programmers manually fix bugs, we show that a prompt-based strategy that conceptualizes repair as localization, transformation, and candidate ranking, can successfully repair programs in multiple languages with minimal effort. We present the first results for such a multilingual repair engine by evaluating on 6 different languages and comparing performance to language-specific repair engines. We show that RING can outperform language-specific repair engines for three of these languages.


Learning to Reason With Relational Abstractions

arXiv.org Artificial Intelligence

Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.


Legal Prompt Engineering for Multilingual Legal Judgement Prediction

arXiv.org Artificial Intelligence

Legal Prompt Engineering (LPE) or Legal Prompting is a process to guide and assist a large language model (LLM) with performing a natural legal language processing (NLLP) skill. Our goal is to use LPE with LLMs over long legal documents for the Legal Judgement Prediction (LJP) task. We investigate the performance of zero-shot LPE for given facts in case-texts from the European Court of Human Rights (in English) and the Federal Supreme Court of Switzerland (in German, French and Italian). Our results show that zero-shot LPE is better compared to the baselines, but it still falls short compared to current state of the art supervised approaches. Nevertheless, the results are important, since there was 1) no explicit domain-specific data used - so we show that the transfer to the legal domain is possible for general-purpose LLMs, and 2) the LLMs where directly applied without any further training or fine-tuning - which in turn saves immensely in terms of additional computational costs.


Codex Hacks HackerRank: Memorization Issues and a Framework for Code Synthesis Evaluation

arXiv.org Artificial Intelligence

The Codex model has demonstrated extraordinary competence in synthesizing code from natural language problem descriptions. However, in order to reveal unknown failure modes and hidden biases, such large-scale models must be systematically subjected to multiple and diverse evaluation studies. In this work, we evaluate the code synthesis capabilities of the Codex model based on a set of 115 Python problem statements from a popular competitive programming portal: HackerRank. Our evaluation shows that Codex is indeed proficient in Python, solving 96% of the problems in a zero-shot setting, and 100% of the problems in a few-shot setting. However, Codex exhibits clear signs of generating memorized code based on our evaluation. This is alarming, especially since the adoption and use of such models could directly impact how code is written and produced in the foreseeable future. With this in mind, we further discuss and highlight some of the prominent risks associated with large-scale models of source code. Finally, we propose a framework for code-synthesis evaluation using variations of problem statements based on mutations.


Transformer Language Models without Positional Encodings Still Learn Positional Information

arXiv.org Artificial Intelligence

Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position. Our findings indicate that causal LMs might derive positional awareness not only from the explicit positioning mechanism, but also from the effects of the causal mask.