AITopics | Tan, Liang

Collaborating Authors

Tan, Liang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Systematic Examination of Preference Learning through the Lens of Instruction-Following

Kim, Joongwon, Goyal, Anirudh, Zhang, Aston, Xiong, Bo, Hou, Rui, Kambadur, Melanie, Mahajan, Dhruv, Hajishirzi, Hannaneh, Tan, Liang

arXiv.org Artificial IntelligenceDec-18-2024

Preference learning is a widely adopted post-training technique that aligns large language models (LLMs) to human preferences and improves specific downstream task capabilities. In this work we systematically investigate how specific attributes of preference datasets affect the alignment and downstream performance of LLMs in instruction-following tasks. We use a novel synthetic data generation pipeline to generate 48,000 unique instruction-following prompts with combinations of 23 verifiable constraints that enable fine-grained and automated quality assessments of model responses. With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS) - to obtain pairs of (chosen, rejected) responses. Then, we perform experiments investigating the effects of (1) the presence of shared prefixes between the chosen and rejected responses, (2) the contrast and quality of the chosen, rejected responses and (3) the complexity of the training prompts. Our experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements and greater stability across challenging training configurations. High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance by balancing diversity and learning efficiency. Additionally, training on prompts of moderate difficulty leads to better generalization across tasks, even for more complex evaluation scenarios, compared to overly challenging prompts. Our findings provide actionable insights into optimizing preference data curation for instruction-following tasks, offering a scalable and effective framework for enhancing LLM training and alignment.

constraint, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.15282

Country:

Europe > Austria (0.28)
North America > United States (0.28)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Self-Generated Critiques Boost Reward Modeling for Language Models

Yu, Yue, Chen, Zhengxing, Zhang, Aston, Tan, Liang, Zhu, Chenguang, Pang, Richard Yuanzhe, Qian, Yundi, Wang, Xuewei, Gururangan, Suchin, Zhang, Chao, Kambadur, Melanie, Mahajan, Dhruv, Hou, Rui

arXiv.org Artificial IntelligenceDec-18-2024

Reinforcement Learning from Human Feedback (RLHF) has been widely adopted to align large language models (LLMs) with human preferences (Ouyang et al., 2022; Touvron et al., 2023; Dubey et al., 2024; Reid et al., 2024). Central to the RLHF process is the reward model (RM), which is trained to assign scores that quantify how well the model's outputs align with human judgments. The reward model defines optimization direction during training (e.g., reward signal in PPO), encouraging a policy LLM to generate more helpful, honest, and harmless responses ultimately enhancing the model's generation quality in real-world applications. Standard reward models are typically trained using preference pairs and optimized with pairwise logistic loss (Bradley and Terry, 1952), producing a single scalar score for each response. However, outputting a scalar score not only is hard to interpret but also fails to fully leverage the inherent language modeling capability that LLMs obtain from pretraining and post-training (Zhang et al., 2024). Consequently, these reward models tend to be less data-efficient and prone to robustness issues, such as reward hacking (Skalse et al., 2022; Singhal et al., 2023; Chen et al., 2024).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.16646

Country:

North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Law of the Weakest Link: Cross Capabilities of Large Language Models

Zhong, Ming, Zhang, Aston, Wang, Xuewei, Hou, Rui, Xiong, Wenhan, Zhu, Chenguang, Chen, Zhengxing, Tan, Liang, Bi, Chloe, Lewis, Mike, Popuri, Sravya, Narang, Sharan, Kambadur, Melanie, Mahajan, Dhruv, Edunov, Sergey, Han, Jiawei, van der Maaten, Laurens

arXiv.org Artificial IntelligenceOct-2-2024

The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities. However, this overlooks the intersection of multiple abilities across different types of expertise that are often required for real-world tasks, which we term cross capabilities. To systematically explore this concept, we first define seven core individual capabilities and then pair them to form seven common cross capabilities, each supported by a manually constructed taxonomy. Building on these definitions, we introduce CrossEval, a benchmark comprising 1,400 human-annotated prompts, with 100 prompts for each individual and cross capability. To ensure reliable evaluation, we involve expert annotators to assess 4,200 model responses, gathering 8,400 human ratings with detailed explanations to serve as reference examples. Our findings reveal that, in both static evaluations and attempts to enhance specific abilities, current LLMs consistently exhibit the "Law of the Weakest Link," where cross-capability performance is significantly constrained by the weakest component. Specifically, across 58 cross-capability scores from 17 models, 38 scores are lower than all individual capabilities, while 20 fall between strong and weak, but closer to the weaker ability. These results highlight the under-performance of LLMs in cross-capability tasks, making the identification and improvement of the weakest capabilities a critical priority for future research to optimize performance in complex, multi-dimensional scenarios.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2409.19951

Country:

Africa (0.92)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Games (1.00)
Law (0.93)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Audiobox: Unified Audio Generation with Natural Language Prompts

Vyas, Apoorv, Shi, Bowen, Le, Matthew, Tjandra, Andros, Wu, Yi-Chiao, Guo, Baishan, Zhang, Jiemin, Zhang, Xinyue, Adkins, Robert, Ngan, William, Wang, Jeff, Cruz, Ivan, Akula, Bapi, Akinyemi, Akinniyi, Ellis, Brian, Moritz, Rashel, Yungster, Yael, Rakotoarison, Alice, Tan, Liang, Summers, Chris, Wood, Carleigh, Lane, Joshua, Williamson, Mary, Hsu, Wei-Ning

arXiv.org Artificial IntelligenceDec-25-2023

Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in several aspects: speech generation models cannot synthesize novel styles based on text description and are limited on domain coverage such as outdoor environments; sound generation models only provide coarse-grained control based on descriptions like "a person speaking" and would only generate mumbling human voices. This paper presents Audiobox, a unified model based on flow-matching that is capable of generating various audio modalities. We design description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms. We allow transcript, vocal, and other audio styles to be controlled independently when generating speech. To improve model generalization with limited labels, we adapt a self-supervised infilling objective to pre-train on large quantities of unlabeled audio. Audiobox sets new benchmarks on speech and sound generation (0.745 similarity on Librispeech for zero-shot TTS; 0.77 FAD on AudioCaps for text-to-sound) and unlocks new methods for generating audio with novel vocal and acoustic styles. We further integrate Bespoke Solvers, which speeds up generation by over 25 times compared to the default ODE solver for flow-matching, without loss of performance on several tasks. Our demo is available at https://audiobox.metademolab.com/

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.15821

Country:

North America > Canada (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

Learning Easily Updated General Purpose Text Representations with Adaptable Task-Specific Prefixes

Huang, Kuan-Hao, Tan, Liang, Hou, Rui, Wang, Sinong, Almahairi, Amjad, Rinott, Ruty

arXiv.org Artificial IntelligenceOct-14-2023

Many real-world applications require making multiple predictions from the same text. Fine-tuning a large pre-trained language model for each downstream task causes computational burdens in the inference time due to several times of forward passes. To amortize the computational cost, freezing the language model and building lightweight models for downstream tasks based on fixed text representations are common solutions. Accordingly, how to learn fixed but general text representations that can generalize well to unseen downstream tasks becomes a challenge. Previous works have shown that the generalizability of representations can be improved by fine-tuning the pre-trained language model with some source tasks in a multi-tasking way. In this work, we propose a prefix-based method to learn the fixed text representations with source tasks. We learn a task-specific prefix for each source task independently and combine them to get the final representations. Our experimental results show that prefix-based training performs better than multi-tasking training and can update the text representations at a smaller computational cost than multi-tasking training.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.13499

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

Chan, Aaron, Sanjabi, Maziar, Mathias, Lambert, Tan, Liang, Nie, Shaoliang, Peng, Xiaochang, Ren, Xiang, Firooz, Hamed

arXiv.org Artificial IntelligenceFeb-26-2023

An extractive rationale explains a language model's (LM's) prediction on a given task instance by highlighting the text inputs that most influenced the prediction. Ideally, rationale extraction should be faithful (reflective of LM's actual behavior) and plausible (convincing to humans), without compromising the LM's (i.e., task model's) task performance. Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework that generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (i.e., faithfulness and plausibility criteria); and (3) jointly the train task model and rationale extractor on the task using the selected objectives. UNIREX enables replacing prior works' heuristic design choices with a generic learned rationale extractor in (1) and optimizing it for all three desiderata in (2)-(3). To facilitate comparison between methods with respect to multiple desiderata, we introduce the Normalized Relative Gain (NRG) metric. Across five text classification datasets, our best UNIREX configuration outperforms baselines by an average of 32.9% NRG. Plus, we find that UNIREX-trained rationale extractors can even generalize to unseen datasets and tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2112.08802

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FRAME: Evaluating Rationale-Label Consistency Metrics for Free-Text Rationales

Chan, Aaron, Nie, Shaoliang, Tan, Liang, Peng, Xiaochang, Firooz, Hamed, Sanjabi, Maziar, Ren, Xiang

arXiv.org Artificial IntelligenceDec-2-2022

Following how humans communicate, free-text rationales aim to use natural language to explain neural language model (LM) behavior. However, free-text rationales' unconstrained nature makes them prone to hallucination, so it is important to have metrics for free-text rationale quality. Existing free-text rationale metrics measure how consistent the rationale is with the LM's predicted label, but there is no protocol for assessing such metrics' reliability. Thus, we propose FRAME, a framework for evaluating rationale-label consistency (RLC) metrics for free-text rationales. FRAME is based on three axioms: (1) good metrics should yield highest scores for reference rationales, which maximize RLC by construction; (2) good metrics should be appropriately sensitive to semantic perturbation of rationales; and (3) good metrics should be robust to variation in the LM's task performance. Across three text classification datasets, we show that existing RLC metrics cannot satisfy all three FRAME axioms, since they are implemented via model pretraining which muddles the metric's signal. Then, we introduce a non-pretraining RLC metric that greatly outperforms baselines on (1) and (3), while performing competitively on (2). Finally, we discuss the limitations of using RLC to evaluate free-text rationales.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2207.00779

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.93)

Add feedback