Large Language Model
Could Elon Musk's xAI be exactly what the world needs?
ELON MUSK, not content with helming recent purchase Twitter alongside SpaceX and his other long-standing firms, has announced an artificial intelligence start-up called xAI. People have speculated that it might be an attempt to challenge OpenAI's ChatGPT, an AI-powered chatbot that has grown to 100 million monthly users in the blink of an eye. But a veil of mystery hangs over the venture, whose goal is "to understand the true nature of the universe". Musk isn't averse to grandiose statements and marketing bluff – a SpaceX mission to Mars is seemingly always just on the horizon – but a …
AI Leaders Create Industry Watchdog as Government Scrutiny Grows
Facing calls to put guardrails on artificial intelligence development, a group of tech companies including Alphabet Inc.'s Google and OpenAI Inc. are creating an industry body to ensure that AI models are safe. The effort, also backed by AI startup Anthropic and Microsoft Corp., aims to consolidate the expertise of member companies and create benchmarks for the industry, according to a statement Wednesday. The group, known as the Frontier Model Forum, said it welcomed participation from other organizations working on large-scale machine-learning platforms. The fast proliferation of generative AI tools such as OpenAI's ChatGPT, which can create text, photos and even video based on simple prompts, has put pressure on tech giants to tread carefully. The companies involved in the Frontier Model Forum have already agreed to put safeguards in place -- at the urging of the White House -- before Congress potentially passes binding regulations.
Congratulations to the #ICML2023 outstanding paper award winners
This year's International Conference on Machine Learning (ICML) is taking place in Honolulu, Hawai'i from 23-29 July. The winners of the outstanding paper awards for 2023 have now been announced. This paper introduces an interesting approach that aims to address the challenge of obtaining a learning rate free optimal bound for non-smooth stochastic convex optimization. The authors propose a novel method that overcomes the limitations imposed by traditional learning rate selection in optimizing such problems. This research makes a valuable and practical contribution to the field of optimization.
How User Language Affects Conflict Fatality Estimates in ChatGPT
Kazenwadel, Daniel, Steinert, Christoph V.
OpenAI's ChatGPT language model has gained popularity as a powerful tool for complex problem-solving and information retrieval. However, concerns arise about the reproduction of biases present in the language-specific training data. In this study, we address this issue in the context of the Israeli-Palestinian and Turkish-Kurdish conflicts. Using GPT-3.5, we employed an automated query procedure to inquire about casualties in specific airstrikes, in both Hebrew and Arabic for the former conflict and Turkish and Kurdish for the latter. Our analysis reveals that GPT-3.5 provides 27$\pm$11 percent lower fatality estimates when queried in the language of the attacker than in the language of the targeted group. Evasive answers denying the existence of such attacks further increase the discrepancy, creating a novel bias mechanism not present in regular search engines. This language bias has the potential to amplify existing media biases and contribute to information bubbles, ultimately reinforcing conflicts.
PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning
Brahman, Faeze, Bhagavatula, Chandra, Pyatkin, Valentina, Hwang, Jena D., Li, Xiang Lorraine, Arai, Hirona J., Sanyal, Soumya, Sakaguchi, Keisuke, Ren, Xiang, Choi, Yejin
Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using large language models (LLMs), they are hindered by drawbacks such as costly API calls and reproducibility issues. In this paper, we advocate planning using smaller language models. We present PlaSma, a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities. More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning. In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation. In both the original and counterfactual setting, we show that orders-of-magnitude smaller models (770M-11B parameters) can compete and often surpass their larger teacher models' capabilities.
A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Human Language?
Marcus, Gary, Leivada, Evelina, Murphy, Elliot
Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of large language models have been linked to claims about human-like linguistic performance and their applications are hailed both as a key step towards Artificial General Intelligence and as major advance in understanding the cognitive, and even neural basis of human language. We analyze the contribution of large language models as theoretically informative representations of a target system vs. atheoretical powerful mechanistic tools, and we identify the key abilities that are still missing from the current state of development and exploitation of these models.
Building and Testing a General Intelligence Embodied in a Humanoid Robot
Gildert, Suzanne, Rose, Geordie
Machines with human-level intelligence should be able to do most economically valuable work. This aligns a major economic incentive with the scientific grand challenge of building a human-like mind. Here we describe our approach to building and testing such a system. Our approach comprises a physical humanoid robotic system; a software based control system for robots of this type; a performance metric, which we call g+, designed to be a measure of human-like intelligence in humanoid robots; and an evolutionary algorithm for incrementally increasing scores on this performance metric. We introduce and describe the current status of each of these. We report on current and historical measurements of the g+ metric on the systems described here.
Utilizing Large Language Models for Natural Interface to Pharmacology Databases
Lu, Hong, Li, Chuan, Li, Yinheng, Zhao, Jie
The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results. Each stage requires accessing and querying vast amounts of information. In this abstract, we introduce a Large Language Model (LLM)-based Natural Language Interface designed to interact with structured information stored in databases. Our experiments demonstrate the feasibility and effectiveness of the proposed framework. This framework can generalize to query a wide range of pharmaceutical data and knowledge bases.
Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking
Ramirez, Angela, Agarwal, Karik, Juraska, Juraj, Garg, Utkarsh, Walker, Marilyn A.
Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
Chen, Mayee F., Roberts, Nicholas, Bhatia, Kush, Wang, Jue, Zhang, Ce, Sala, Frederic, Ré, Christopher
The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.