Goto

Collaborating Authors

 Large Language Model


Code4Struct: Code Generation for Few-Shot Event Structure Prediction

arXiv.org Artificial Intelligence

Large Language Model (LLM) trained on a mixture of text and code has demonstrated impressive capability in translating natural language (NL) into structured code. We observe that semantic structures can be conveniently translated into code and propose Code4Struct to leverage such text-to-structure translation capability to tackle structured prediction tasks. As a case study, we formulate Event Argument Extraction (EAE) as converting text into event-argument structures that can be represented as a class object using code. This alignment between structures and code enables us to take advantage of Programming Language (PL) features such as inheritance and type annotation to introduce external knowledge or add constraints. We show that, with sufficient in-context examples, formulating EAE as a code generation problem is advantageous over using variants of text-based prompts. Despite only using 20 training event instances for each event type, Code4Struct is comparable to supervised models trained on 4,202 instances and outperforms current state-of-the-art (SOTA) trained on 20-shot data by 29.5% absolute F1. Code4Struct can use 10-shot training data from a sibling event type to predict arguments for zero-resource event types and outperforms the zero-shot baseline by 12% absolute F1.


Code as Policies: Language Model Programs for Embodied Control

arXiv.org Artificial Intelligence

Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively. By chaining classic logic structures and referencing third-party libraries (e.g., NumPy, Shapely) to perform arithmetic, LLMs used in this way can write robot policies that (i) exhibit spatial-geometric reasoning, (ii) generalize to new instructions, and (iii) prescribe precise values (e.g., velocities) to ambiguous descriptions ("faster") depending on context (i.e., behavioral commonsense). This paper presents code as policies: a robot-centric formulation of language model generated programs (LMPs) that can represent reactive policies (e.g., impedance controllers), as well as waypoint-based policies (vision-based pick and place, trajectory-based control), demonstrated across multiple real robot platforms. Central to our approach is prompting hierarchical code-gen (recursively defining undefined functions), which can write more complex code and also improves state-of-the-art to solve 39.8% of problems on the HumanEval [1] benchmark. Code and videos are available at https://code-as-policies.github.io


Undetectable Watermarks for Language Models

arXiv.org Artificial Intelligence

Recent advances in the capabilities of large language models such as GPT-4 have spurred increasing concern about our ability to detect AI-generated text. Prior works have suggested methods of embedding watermarks in model outputs, by noticeably altering the output distribution. We ask: Is it possible to introduce a watermark without incurring any detectable change to the output distribution? To this end we introduce a cryptographically-inspired notion of undetectable watermarks for language models. That is, watermarks can be detected only with the knowledge of a secret key; without the secret key, it is computationally intractable to distinguish watermarked outputs from those of the original model. In particular, it is impossible for a user to observe any degradation in the quality of the text. Crucially, watermarks should remain undetectable even when the user is allowed to adaptively query the model with arbitrarily chosen prompts. We construct undetectable watermarks based on the existence of one-way functions, a standard assumption in cryptography.


Unlocking Temporal Question Answering for Large Language Models Using Code Execution

arXiv.org Artificial Intelligence

Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challenge that LLMs face when it comes to temporal reasoning. Our preliminary experiments show that generating intermediate reasoning steps does not always boost the performance of complex temporal question-answering tasks. Therefore, we propose a novel framework that combines the extraction capability of LLMs and the logical reasoning capability of a Python solver to tackle this issue. Extensive experiments and analysis demonstrate the effectiveness of our framework in handling intricate time-bound reasoning tasks.


Sentiment Analysis in the Era of Large Language Models: A Reality Check

arXiv.org Artificial Intelligence

Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring deeper understanding or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a more comprehensive and realistic evaluation. Data and code during our investigations are available at \url{https://github.com/DAMO-NLP-SG/LLM-Sentiment}.


Drafting Event Schemas using Language Models

arXiv.org Artificial Intelligence

Past work has studied event prediction and event language modeling, sometimes mediated through structured representations of knowledge in the form of event schemas. Such schemas can lead to explainable predictions and forecasting of unseen events given incomplete information. In this work, we look at the process of creating such schemas to describe complex events. We use large language models (LLMs) to draft schemas directly in natural language, which can be further refined by human curators as necessary. Our focus is on whether we can achieve sufficient diversity and recall of key events and whether we can produce the schemas in a sufficiently descriptive style. We show that large language models are able to achieve moderate recall against schemas taken from two different datasets, with even better results when multiple prompts and multiple samples are combined. Moreover, we show that textual entailment methods can be used for both matching schemas to instances of events as well as evaluating overlap between gold and predicted schemas. Our method paves the way for easier distillation of event knowledge from large language model into schemas.


ChatGPT: Can China overtake the US in the AI marathon?

BBC News

But China could catch up, according to analysts, as AI solutions take years to be perfected. Chinese internet companies "are arguably more advanced than US internet companies, depending on how you're measuring advancement," Kendra Schaefer, head of tech policy research at Trivium China tells the BBC.


Biden Administration Developing National AI Strategy

WSJ.com: WSJD - Technology

WASHINGTON--The Biden administration took another step Tuesday toward regulating new artificial intelligence tools such as ChatGPT, asking for public input as it seeks to develop a national AI strategy to guard against misinformation and other potential downsides of the technology.


Microsoft puts AI in the heart of Windows 11 with Windows Copilot

Engadget

Unlike Meta, Microsoft doesn't need to change its name to prove it's committed to an entirely new tech platform: It's doing so through action. After debuting its AI-infused Bing search engine earlier this year, the company unveiled the Microsoft 365 Copilot for Office apps. And even before those consumer reveals, Microsoft delivered an AI tool for developers in 2021 with GitHub Copilot. Today at its Build developer conference, Microsoft is making the inevitable next step: It's making AI an integral part of Windows 11. The new Windows Copilot tool lives in the Windows sidebar and, just like Bing's AI chat, you can use it as a super-powered search engine by typing in general questions. But true to its name, it's also deeply integrated with Windows.


Microsoft is helping developers build their own ChatGPT-compatible AI copilots

Engadget

Microsoft has a lot of news at this year's Build conference around its AI "copilots" for Windows 11 and other products, but it wants third-party developers in on the action too. The company announced that it has expanded its AI plugin ecosystem and provided a framework for building AI apps and copilots. At the same time, it's adopting the same open plugin standard that OpenAI uses for ChatGPT to ensure it'll work alongside its Windows 11, 365 and other copilots. Microsoft introduced the idea of copilots nearly two years ago. Those are applications that use AI and LLMs (large language models) to help users with complex cognitive tasks like writing sales pitches, generating images and more.