Goto

Collaborating Authors

 Ding, Haibo


A Systematic Survey of Automatic Prompt Optimization Techniques

arXiv.org Artificial Intelligence

Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use various automated techniques to help improve the performance of LLMs on various tasks. In this paper, we present a comprehensive survey summarizing the current progress and remaining challenges in this field. We provide a formal definition of APO, a 5-part unifying framework, and then proceed to rigorously categorize all relevant works based on their salient features therein. We hope to spur further research guided by our framework.


Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

arXiv.org Artificial Intelligence

Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings, making our model a simple and adaptable solution for knowledge-intensive tasks. Code and models are available at https://github.com/jzbjyb/ReAtt.


Weakly Supervised Named Entity Tagging with Learnable Logical Rules

arXiv.org Artificial Intelligence

We study the problem of building entity tagging systems by using a few rules as weak supervision. Previous methods mostly focus on disambiguation entity types based on contexts and expert-provided rules, while assuming entity spans are given. In this work, we propose a novel method TALLOR that bootstraps high-quality logical rules to train a neural tagger in a fully automated manner. Specifically, we introduce compound rules that are composed from simple rules to increase the precision of boundary detection and generate more diverse pseudo labels. We further design a dynamic label selection strategy to ensure pseudo label quality and therefore avoid overfitting the neural tagger. Experiments on three datasets demonstrate that our method outperforms other weakly supervised methods and even rivals a state-of-the-art distantly supervised tagger with a lexicon of over 2,000 terms when starting from only 20 simple rules. Our method can serve as a tool for rapidly building taggers in emerging domains and tasks. Case studies show that learned rules can potentially explain the predicted entities.


Weakly Supervised Induction of Affective Events by Optimizing Semantic Consistency

AAAI Conferences

To understand narrative text, we must comprehend how people are affected by the events that they experience. For example, readers understand that graduating from college is a positive event (achievement) but being fired from one's job is a negative event (problem). NLP researchers have developed effective tools for recognizing explicit sentiments, but affective events are more difficult to recognize because the polarity is often implicit and can depend on both a predicate and its arguments. Our research investigates the prevalence of affective events in a personal story corpus, and introduces a weakly supervised method for large scale induction of affective events. We present an iterative learning framework that constructs a graph with nodes representing events and initializes their affective polarities with sentiment analysis tools as weak supervision. The events are then linked based on three types of semantic relations: (1) semantic similarity, (2) semantic opposition, and (3) shared components. The learning algorithm iteratively refines the polarity values by optimizing semantic consistency across all events in the graph. Our model learns over 100,000 affective events and identifies their polarities more accurately than other methods.


Acquiring Knowledge of Affective Events from Blogs Using Label Propagation

AAAI Conferences

Many common events in our daily life affect us in positive and negative ways. For example, going on vacation is typically an enjoyable event, while being rushed to the hospital is an undesirable event. In narrative stories and personal conversations, recognizing that some events have a strong affective polarity is essential to understand the discourse and the emotional states of the affected people. However, current NLP systems mainly depend on sentiment analysis tools, which fail to recognize many events that are implicitly affective based on human knowledge about the event itself and cultural norms. Our goal is to automatically acquire knowledge of stereotypically positive and negative events from personal blogs. Our research creates an event context graph from a large collection of blog posts and uses a sentiment classifier and semi-supervised label propagation algorithm to discover affective events. We explore several graph configurations that propagate affective polarity across edges using local context, discourse proximity, and event-event co-occurrence. We then harvest highly affective events from the graph and evaluate the agreement of the polarities with human judgements.