AITopics | Min, Bonan

Collaborating Authors

Min, Bonan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Effectively Steer LLM To Follow Preference via Building Confident Directions

Song, Bingqing, Han, Boran, Zhang, Shuai, Wang, Hao, Fang, Haoyang, Min, Bonan, Wang, Yuyang, Hong, Mingyi

arXiv.org Artificial IntelligenceMar-4-2025

Having an LLM that aligns with human preferences is essential for accommodating individual needs, such as maintaining writing style or generating specific topics of interest. The majority of current alignment methods rely on fine-tuning or prompting, which can be either costly or difficult to control. Model steering algorithms, which modify the model output by constructing specific steering directions, are typically easy to implement and optimization-free. However, their capabilities are typically limited to steering the model into one of the two directions (i.e., bidirectional steering), and there has been no theoretical understanding to guarantee their performance. In this work, we propose a theoretical framework to understand and quantify the model steering methods. Inspired by the framework, we propose a confident direction steering method (CONFST) that steers LLMs via modifying their activations at inference time. More specifically, CONFST builds a confident direction that is closely aligned with users' preferences, and this direction is then added to the activations of the LLMs to effectively steer the model output. Our approach offers three key advantages over popular bidirectional model steering methods: 1) It is more powerful, since multiple (i.e. more than two) users' preferences can be aligned simultaneously; 2) It is simple to implement, since there is no need to determine which layer to add the steering vector to; 3) No explicit user instruction is required. We validate our method on GPT-2 XL (1.5B), Mistral (7B) and Gemma-it (9B) models for tasks that require shifting the output of LLMs across various topics and styles, achieving superior performance over competing methods.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.02989

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models

Nguyen, Pha, Sengupta, Sailik, Malik, Girik, Gupta, Arshit, Min, Bonan

arXiv.org Artificial IntelligenceJan-21-2025

The improved competence of generative models can help building multi-modal virtual assistants that leverage modalities beyond language. By observing humans performing multi-step tasks, one can build assistants that have situational awareness of actions and tasks being performed, enabling them to cater assistance based on this understanding. In this paper, we develop a Context-aware Instructional Task Assistant with Multi-modal Large Language Models (InsTALL) that leverages an online visual stream (e.g. a user's screen share or video recording) and responds in real-time to user queries related to the task at hand. To enable useful assistance, InsTALL 1) trains a multi-modal model on task videos and paired textual data, and 2) automatically extracts task graph from video data and leverages it at training and inference time. We show InsTALL achieves state-of-the-art performance across proposed sub-tasks considered for multimodal activity understanding -- task recognition (TR), action recognition (AR), next action prediction (AP), and plan prediction (PP) -- and outperforms existing baselines on two novel sub-tasks related to automatic error identification.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.12231

Country:

North America > United States > Colorado (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry: Education > Educational Technology (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Open Domain Question Answering with Conflicting Contexts

Liu, Siyi, Ning, Qiang, Halder, Kishaloy, Xiao, Wei, Qi, Zheng, Htut, Phu Mon, Zhang, Yi, John, Neha Anna, Min, Bonan, Benajiba, Yassine, Roth, Dan

arXiv.org Artificial IntelligenceNov-18-2024

Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depending on this information may result in untruthful and inaccurate answers. To understand the gravity of this problem, we collect a human-annotated dataset, Question Answering with Conflicting Contexts (QACC), and find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search. We evaluate and benchmark three powerful Large Language Models (LLMs) with our dataset QACC and demonstrate their limitations in effectively addressing questions with conflicting information. To explore how humans reason through conflicting contexts, we request our annotators to provide explanations for their selections of correct answers. We demonstrate that by finetuning LLMs to explain their answers, we can introduce richer information into their training that guide them through the process of reasoning with conflicting contexts.

large language model, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2410.12311

Country: Europe (0.46)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall

Yuan, Jiaqing, Pan, Lin, Hang, Chung-Wei, Guo, Jiang, Jiang, Jiarong, Min, Bonan, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial IntelligenceApr-24-2024

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.16164

Country:

Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification

Wang, Fei, Shang, Chao, Jain, Sarthak, Wang, Shuai, Ning, Qiang, Min, Bonan, Castelli, Vittorio, Benajiba, Yassine, Roth, Dan

arXiv.org Artificial IntelligenceMar-10-2024

User alignment is crucial for adapting general-purpose language models (LMs) to downstream tasks, but human annotations are often not available for all types of instructions, especially those with customized constraints. We observe that user instructions typically contain constraints. While assessing response quality in terms of the whole instruction is often costly, efficiently evaluating the satisfaction rate of constraints is feasible. We investigate common constraints in NLP tasks, categorize them into three classes based on the types of their arguments, and propose a unified framework, ACT (Aligning to ConsTraints), to automatically produce supervision signals for user alignment with constraints. Specifically, ACT uses constraint verifiers, which are typically easy to implement in practice, to compute constraint satisfaction rate (CSR) of each response. It samples multiple responses for each prompt and collect preference labels based on their CSR automatically. Subsequently, ACT adapts the LM to the target task through a ranking-based learning process. Experiments on fine-grained entity typing, abstractive summarization, and temporal question answering show that ACT is able to enhance LMs' capability to adhere to different classes of constraints, thereby improving task performance. Further experiments show that the constraint-following capabilities are transferable.

artificial intelligence, constraint, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.06326

Country:

Europe (1.00)
North America > Canada (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Multi-Modal Multilingual Benchmark for Document Image Classification

Fujinuma, Yoshinari, Varia, Siddharth, Sankaran, Nishant, Appalaraju, Srikar, Min, Bonan, Vyas, Yogarshi

arXiv.org Artificial IntelligenceOct-25-2023

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that the only existing dataset for this task (Lewis et al., 2006) has several limitations and we introduce two newly curated multilingual datasets WIKI-DOC and MULTIEURLEX-DOC that overcome these limitations. We further undertake a comprehensive study of popular visually-rich document understanding or Document AI models in previously untested setting in document image classification such as 1) multi-label classification, and 2) zero-shot cross-lingual transfer setup. Experimental results show limitations of multilingual Document AI models on cross-lingual transfer across typologically distant languages. Our datasets and findings open the door for future research into improving Document AI models.

artificial intelligence, multi-modal multilingual benchmark, natural language, (1 more...)

arXiv.org Artificial Intelligence

2310.16356

Genre: Research Report (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.80)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.80)
Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

Li, Alexander Hanbo, Shang, Mingyue, Spiliopoulou, Evangelia, Ma, Jie, Ng, Patrick, Wang, Zhiguo, Min, Bonan, Wang, William, McKeown, Kathleen, Castelli, Vittorio, Roth, Dan, Xiang, Bing

arXiv.org Artificial IntelligenceAug-9-2023

We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph triples, and meaning representations. We demonstrate that our proposed approach can effectively adapt to new structured forms, and can improve performance in comparison to current methods. For example, our method resulted in a 66% improvement in zero-shot BLEU scores when transferring models trained on table inputs to a knowledge graph dataset. Our proposed method is an important step towards a more general data-to-text generation framework.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.05317

Country:

Europe (0.93)
Asia (0.93)
South America (0.69)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports > Football (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Add feedback

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Min, Bonan, Ross, Hayley, Sulem, Elior, Veyseh, Amir Pouran Ben, Nguyen, Thien Huu, Sainz, Oscar, Agirre, Eneko, Heinz, Ilana, Roth, Dan

arXiv.org Artificial IntelligenceNov-1-2021

Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. We also present approaches that use pre-trained language models to generate data for training augmentation or other purposes. We conclude with discussions on limitations and suggested directions for future research.

computational linguistic, machine learning, question answering, (28 more...)

arXiv.org Artificial Intelligence

2111.01243

Country:

Asia (1.00)
Europe > Italy (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.92)
Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(4 more...)

Add feedback