AITopics | Li, Huayang

Collaborating Authors

Li, Huayang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering

Li, Huayang, Verga, Pat, Sen, Priyanka, Yang, Bowen, Viswanathan, Vijay, Lewis, Patrick, Watanabe, Taro, Su, Yixuan

arXiv.org Artificial IntelligenceOct-4-2024

The context window of large language models (LLMs) has been extended significantly in recent years. However, while the context length that the LLM can process has grown, the capability of the model to accurately reason over that context degrades noticeably. This occurs because modern LLMs often become overwhelmed by the vast amount of information in the context; when answering questions, the model must identify and reason over relevant evidence sparsely distributed throughout the text. To alleviate the challenge of long-context reasoning, we develop a retrieve-then-reason framework, enabling LLMs to reason over relevant evidence collected during an intermediate retrieval step. We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts", resulting in flawed reasoning and the production of incorrect answers. Through extensive experiments on long-context QA benchmarks, we find our method to outperform competitive baselines by large margins, achieving at least 8.4 and 7.9 EM gains on the long-context versions of HotpotQA and SQuAD datasets, respectively. While these developments are promising, in our preliminary study, we show that the long-context performance of LLMs varied significantly across different tasks. We observe that, when tasked to generate answers by directly reasoning over the full context, performance degrades as the input context grows. In contrast, when tasked with retrieving the set of evidence relevant to the question, the performance of LLMs is only mildly affected by the growth of the input context.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.03227

Country:

Asia > Thailand (0.14)
Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Cai, Deng, Li, Huayang, Fu, Tingchen, Li, Siheng, Xu, Weiwen, Li, Shuaiyi, Cao, Bowen, Zhang, Zhisong, Huang, Xinting, Cui, Leyang, Wang, Yan, Liu, Lemao, Watanabe, Taro, Shi, Shuming

arXiv.org Artificial IntelligenceJun-24-2024

Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.16377

Country:

Europe (1.00)
North America > United States > Texas (0.14)
North America > United States > Pennsylvania (0.14)
(2 more...)

Genre:

Overview (0.68)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

Hsu, Benjamin, Liu, Xiaoyu, Li, Huayang, Fujinuma, Yoshinari, Nadejde, Maria, Niu, Xing, Kittenplon, Yair, Litman, Ron, Pappagari, Raghavendra

arXiv.org Artificial IntelligenceJun-12-2024

Document translation poses a challenge for Neural Machine Translation (NMT) systems. Most document-level NMT systems rely on meticulously curated sentence-level parallel data, assuming flawless extraction of text from documents along with their precise reading order. These systems also tend to disregard additional visual cues such as the document layout, deeming it irrelevant. However, real-world documents often possess intricate text layouts that defy these assumptions. Extracting information from Optical Character Recognition (OCR) or heuristic rules can result in errors, and the layout (e.g., paragraphs, headers) may convey relationships between distant sections of text. This complexity is particularly evident in widely used PDF documents, which represent information visually. This paper addresses this gap by introducing M3T, a novel benchmark dataset tailored to evaluate NMT systems on the comprehensive task of translating semi-structured documents. This dataset aims to bridge the evaluation gap in document-level NMT systems, acknowledging the challenges posed by rich text layouts in real-world applications.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2406.08255

Country:

Europe (1.00)
North America > United States > Maryland (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Cross-lingual Contextualized Phrase Retrieval

Li, Huayang, Cai, Deng, Qu, Zhi, Cui, Qu, Kamigaito, Hidetaka, Liu, Lemao, Watanabe, Taro

arXiv.org Artificial IntelligenceMar-25-2024

Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific training data and models are the primary challenges to achieve our goal. As a result, we extract pairs of cross-lingual phrases using word alignment information automatically induced from parallel sentences. Subsequently, we train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning, which encourages the hidden representations of phrases with similar contexts and semantics to align closely. Comprehensive experiments on both the cross-lingual phrase retrieval task and a downstream task, i.e, machine translation, demonstrate the effectiveness of CCPR. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher. When utilizing CCPR to augment the large-language-model-based translator, it achieves average gains of 0.7 and 1.5 in BERTScore for translations from X=>En and vice versa, respectively, on WMT16 dataset. Our code and data are available at \url{https://github.com/ghrua/ccpr_release}.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.1682

Country:

Europe (1.00)
North America > Canada (0.68)
Asia > Middle East > UAE (0.14)
(3 more...)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

Shi, Shuming, Zhao, Enbo, Cai, Deng, Cui, Leyang, Huang, Xinting, Li, Huayang

arXiv.org Artificial IntelligenceJan-16-2024

With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by implementing a modular framework of atomic build-blocks and technologies, Inferflow is compositionally generalizable to new models. Second, 3.5-bit quantization is introduced in Inferflow as a tradeoff between 3-bit and 4-bit quantization. Third, hybrid model partitioning for multi-GPU inference is introduced in Inferflow to better balance inference speed and throughput than the commonly-adopted partitionby-layer and partition-by-tensor strategies.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.08294

Country:

North America > United States (0.14)
Africa > Ethiopia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild

Li, Huayang, Li, Siheng, Cai, Deng, Wang, Longyue, Liu, Lemao, Watanabe, Taro, Yang, Yujiu, Shi, Shuming

arXiv.org Artificial IntelligenceJan-8-2024

Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.08637

Country:

North America > United States > Maryland (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.64)

Industry:

Media (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Li, Huayang, Lan, Tian, Fu, Zihao, Cai, Deng, Liu, Lemao, Collier, Nigel, Watanabe, Taro, Su, Yixuan

arXiv.org Artificial IntelligenceOct-16-2023

There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.

large language model, machine learning, repetition, (18 more...)

arXiv.org Artificial Intelligence

2310.10226

Country:

Europe (0.67)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports > Football (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

PandaGPT: One Model To Instruction-Follow Them All

Su, Yixuan, Lan, Tian, Li, Huayang, Xu, Jialu, Wang, Yan, Cai, Deng

arXiv.org Artificial IntelligenceMay-25-2023

We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do. Our project page is at https://panda-gpt.github.io/.

artificial intelligence, natural language, pandagpt, (12 more...)

arXiv.org Artificial Intelligence

2305.16355

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

A Frustratingly Simple Decoding Method for Neural Text Generation

Yang, Haoran, Cai, Deng, Li, Huayang, Bi, Wei, Lam, Wai, Shi, Shuming

arXiv.org Artificial IntelligenceMay-21-2023

We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-gram language model or a vectorized variant. In this way, FSD introduces no extra model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date (i.e., nucleus sampling) as well as several strong baselines that were proposed recently.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.12675

Country:

Europe (1.00)
North America > Haiti (0.93)
Asia > Middle East (0.93)
(3 more...)

Genre:

Research Report (1.00)
Personal > Interview (0.46)
Personal > Obituary (0.46)

Industry:

Transportation (1.00)
Media (1.00)
Materials > Chemicals > Industrial Gases > Liquified Gas (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Unified Text Structuralization with Instruction-tuned Language Models

Ni, Xuanfan, Li, Piji, Li, Huayang

arXiv.org Artificial IntelligenceMar-30-2023

Text structuralization is one of the important fields of natural language processing (NLP) consists of information extraction (IE) and structure formalization. However, current studies of text structuralization suffer from a shortage of manually annotated high-quality datasets from different domains and languages, which require specialized professional knowledge. In addition, most IE methods are designed for a specific type of structured data, e.g., entities, relations, and events, making them hard to generalize to others. In this work, we propose a simple and efficient approach to instruct large language model (LLM) to extract a variety of structures from texts. More concretely, we add a prefix and a suffix instruction to indicate the desired IE task and structure type, respectively, before feeding the text into a LLM. Experiments on two LLMs show that this approach can enable language models to perform comparable with other state-of-the-art methods on datasets of a variety of languages and knowledge, and can generalize to other IE sub-tasks via changing the content of instruction. Another benefit of our approach is that it can help researchers to build datasets in low-source and domain-specific scenarios, e.g., fields in finance and law, with low cost.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.14956

Country:

Africa (0.69)
Europe (0.47)
North America > United States (0.28)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Sports > Basketball (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback