Large Language Model
Automatic Readability Assessment of German Sentences with Transformer Ensembles
Blaneck, Patrick Gustav, Bornheim, Tobias, Grieger, Niklas, Bialonski, Stephan
Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0.435.
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Ushio, Asahi, Espinosa-Anke, Luis, Schockaert, Steven, Camacho-Collados, Jose
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as "eye is to seeing what ear is to hearing", sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.
The Head of Google Says Future AI Must Align with Human Values
AI is the foundational tech at Google and its parent company Alphabet, CEO Sundar Pichai told the audience at this year's Code conference in Los Angeles. He pointed out the "extraordinary" successes of the Google AI and DeepMind teams in areas such as large language models and the AlphaFold project, which showed the underlying structure of 200 million proteins. He said Google was now applying deep computer science and AI to all its products, from search to its work with pharma companies with AlphaFold to self-driving cars. But, he added, it is "important that we develop AI aligned with human values." Conference host Kara Swisher showed a 2016 interview in which Pichai (then interviewed by the now-retired Walt Mossberg) said he expected we would have true "conversational AI" to help get things done in the next 5 to 10 years.
Differentially Private Decoding in Large Language Models
Majmudar, Jimit, Dupuy, Christophe, Peris, Charith, Smaili, Sami, Gupta, Rahul, Zemel, Richard
Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information processed during pre-training. The potential leakage might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide theoretical analysis showing that the proposed mechanism is differentially private, and experimental results showing a privacy-utility trade-off.
Few-shot training LLMs for project-specific code-summarization
Ahmed, Toufique, Devanbu, Premkumar
Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-shot and zero-shot learning: they can learn to perform a task with very few examples. Few-shotting has particular synergies in software engineering, where there are a lot of phenomena (identifier names, APIs, terminology, coding patterns) that are known to be highly project-specific. However, project-specific data can be quite limited, especially early in the history of a project; thus the few-shot learning capacity of LLMs might be very relevant. In this paper, we investigate the use few-shot training with the very large GPT (Generative Pre-trained Transformer) Codex model, and find evidence suggesting that one can significantly surpass state-of-the-art models for code-summarization, leveraging project-specific training.
GitHub - deepmind/mujoco_menagerie: A collection of high-quality models for the MuJoCo physics engine, curated by DeepMind.
Menagerie is a collection of high-quality models for the MuJoCo physics engine, curated by DeepMind. A physics simulator is only as good as the model it is simulating, and in a powerful simulator like MuJoCo with many modeling options, it is easy to create "bad" models which do not behave as expected. The goal of this collection is to provide the community with a curated library of well-designed models that work well right out of the gate. Menagerie's only requirement is MuJoCo version 2.2.2 or higher. You can download prebuilt binaries from the GitHub releases page, or if you are working with Python, you can install the native bindings from PyPI via pip install mujoco 2.2.2.
Improving Language Model Behavior by Training on a Curated Dataset
We've found we can improve language model behavior with respect to specific behavioral values by fine-tuning on a curated dataset of 100 examples of those values. We also found that this process becomes more effective as models get larger. While the technique is still nascent, we're looking for OpenAI API users who would like to try it out and are excited to find ways to use these and other techniques in production use cases. Our approach aims to give language model operators the tools to narrow this universal set of behaviors to a constrained set of values. While OpenAI provides guardrails and monitoring to ensure that model use-cases are compatible with our Charter, we view selecting the exact set of Charter-compatible values for the model as a choice that our users must face for their specific applications.
GPT-3 vs. Rasa chatbots
In 1829, an event took place that unleashed a technological revolution. At the Rainhill Trials a group of steam locomotives squared off to determine which one could win a series of tests of speed, strength and reliability. The winning machine, Rocket, not only blew away its competition at the trials, it also set the direction for steam locomotive development for the following century. What does all this have to do with GPT-3, the transformer language model that OpenAI made available in a limited beta starting in June? Some reviewers have heralded GPT-3 as the first glimpse of artificial general intelligence, while others are calling it a massive lookup table.
DeepMind's Selection-Inference Language Model System Generates Humanly Interpretable Reasoning Traces
Explainability is one of the most pressing concerns in machine learning research and development. Although contemporary large-scale language models (LMs) have demonstrated impressive question-answering capabilities, their inherent opacity can conceal just how these models reach their final answers, making it difficult for users to spot any possible mistakes or justify the outputs. A DeepMind research team addresses this issue in the new paper Faithful Reasoning Using Large Language Models, proposing a forward-chaining selection-inference model that can perform faithful reasoning and provide a valid reasoning trace to improve reasoning quality and help users check and validate the final answers. The proposed approach is based on the idea that LMs can perform faithful multi-step reasoning if the underlying logical structure of a given problem can be mirrored by a causal structure. To realize this, the team developed selection-inference (SI) as their system's backbone, a novel architecture comprising two fine-tuned language models: one for selection and one for inference.
No One Rung to Rule Them All: Addressing Scale and Expediency in Knowledge-Based AI
Can we drive effectiveness and efficiency of AI at the same time? If we want our systems to be more intelligent, do they have to become more expensive? Our goal should be to significantly increase the capabilities and improve the results of AI technologies while minimizing power and system cost, not by increasing it. Achieving this could be possible if we follow the architectural design observed time and again in natural control systems, that is, a hierarchy of specialized levels. This article challenges the single neural network's current large language model (LLM) approach, which attempts to encompass all world knowledge.