AITopics | Thorne, James

Collaborating Authors

Thorne, James

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Hong, Jiwoo, Lee, Noah, Martínez-Castaño, Rodrigo, Rodríguez, César, Thorne, James

arXiv.org Artificial IntelligenceOct-23-2024

Reinforcement learning with human feedback (RLHF) is shown to largely benefit from precise reward models (RMs). However, recent studies in reward modeling schemes are skewed towards English, limiting the applicability of RLHF in multilingual alignments. In this work, we investigate the cross-lingual transfer of RMs trained in diverse languages, primarily from English. Our experimental results demonstrate the strong cross-lingual transfer of English RMs, exceeding target language RMs by 3~4% average increase in Multilingual RewardBench. Furthermore, we analyze the cross-lingual transfer of RMs through the representation shifts. Finally, we perform multilingual alignment to exemplify how cross-lingual transfer in RM propagates to enhanced multilingual instruction-following capability, along with extensive analyses on off-the-shelf RMs. We release the code, model, and data.

large language model, machine learning, rms, (21 more...)

arXiv.org Artificial Intelligence

2410.18027

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)

Add feedback

CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

Kim, Eunsu, Suk, Juyoung, Oh, Philhoon, Yoo, Haneul, Thorne, James, Oh, Alice

arXiv.org Artificial IntelligenceJul-4-2024

Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets that are sourced from Korean data capturing cultural knowledge, only narrow tasks such as bias and hate speech detection are offered. To address this gap, we introduce a benchmark of Cultural and Linguistic Intelligence in Korean (CLIcK), a dataset comprising 1,995 QA pairs. CLIcK sources its data from official Korean exams and textbooks, partitioning the questions into eleven categories under the two main categories of language and culture. For each instance in CLIcK, we provide fine-grained annotation of which cultural and linguistic knowledge is required to answer the question correctly. Using CLIcK, we test 13 language models to assess their performance. Our evaluation uncovers insights into their performances across the categories, as well as the diverse factors affecting their comprehension. CLIcK offers the first large-scale comprehensive Korean-centric analysis of LLMs' proficiency in Korean culture and language.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.06412

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.14)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Ho, Namgyu, Bae, Sangmin, Kim, Taehyeon, Jo, Hyunjik, Kim, Yireun, Schuster, Tal, Fisch, Adam, Thorne, James, Yun, Se-Young

arXiv.org Artificial IntelligenceJun-4-2024

This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inference. We notice that these costs stem from applying self-attention on the global context, therefore we isolate the expensive bottlenecks of global modeling to lower layers and apply fast local modeling in upper layers. To mitigate the remaining costs in the lower layers, we aggregate input tokens into fixed size blocks and then apply self-attention at this coarse level. Context information is aggregated into a single embedding to enable upper layers to decode the next block of tokens, without global attention. Free of global attention bottlenecks, the upper layers can fully utilize the compute hardware to maximize inference throughput. By leveraging global and local modules, the Block Transformer architecture demonstrates 10-20x gains in inference throughput compared to vanilla transformers with equivalent perplexity. Our work introduces a new approach to optimize language model inference through novel application of global-to-local modeling.

decoder, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.02657

Country:

North America (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Epistemology of Language Models: Do Language Models Have Holistic Knowledge?

Kim, Minsu, Thorne, James

arXiv.org Artificial IntelligenceMar-19-2024

This paper investigates the inherent knowledge in language models from the perspective of epistemological holism. The purpose of this paper is to explore whether LLMs exhibit characteristics consistent with epistemological holism. These characteristics suggest that core knowledge, such as general scientific knowledge, each plays a specific role, serving as the foundation of our knowledge system and being difficult to revise. To assess these traits related to holism, we created a scientific reasoning dataset and examined the epistemology of language models through three tasks: Abduction, Revision, and Argument Generation. In the abduction task, the language models explained situations while avoiding revising the core knowledge. However, in other tasks, the language models were revealed not to distinguish between core and peripheral knowledge, showing an incomplete alignment with holistic knowledge principles.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.12862

Country:

Europe (0.46)
Asia (0.28)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English

Shafayat, Sheikh, Hasan, H M Quamran, Mahim, Minhajur Rahman Chowdhury, Putri, Rifki Afina, Thorne, James, Oh, Alice

arXiv.org Artificial IntelligenceMar-16-2024

In this study, we introduce BEnQA, a dataset comprising parallel Bengali and English exam questions for middle and high school levels in Bangladesh. Our dataset consists of approximately 5K questions covering several subjects in science with different types of questions, including factual, application, and reasoning-based questions. We benchmark several Large Language Models (LLMs) with our parallel dataset and observe a notable performance disparity between the models in Bengali and English. We also investigate some prompting methods, and find that Chain-of-Thought prompting is beneficial mostly on reasoning questions, but not so much on factual ones. We also find that appending English translation helps to answer questions in Bengali. Our findings point to promising future research directions for improving the performance of LLMs in Bengali and more generally in low-resource languages.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.109

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > K-12 Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

ORPO: Monolithic Preference Optimization without Reference Model

Hong, Jiwoo, Lee, Noah, Thorne, James

arXiv.org Artificial IntelligenceMar-14-2024

While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, fine-tuning Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to 12.20% on $\text{AlpacaEval}_{2.0}$ (Figure 1), 66.19% on IFEval (instruction-level loose, Table 6), and 7.32 in MT-Bench (Figure 12). We release code and model checkpoints for Mistral-ORPO-$\alpha$ (7B) and Mistral-ORPO-$\beta$ (7B).

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2403.07691

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)

Add feedback

Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks

Seo, Minju, Baek, Jinheon, Thorne, James, Hwang, Sung Ju

arXiv.org Artificial IntelligenceFeb-20-2024

Despite large successes of recent language models on diverse tasks, they suffer from severe performance degeneration in low-resource settings with limited training data available. Many existing works tackle this problem by generating synthetic data from the training data and then training models on them, recently using Large Language Models (LLMs). However, in low-resource settings, the amount of seed data samples to use for data augmentation is very small, which makes generated samples suboptimal and less diverse. To tackle this challenge, we propose a novel method that augments training data by incorporating a wealth of examples from other datasets, along with the given training data. Specifically, we first retrieve the relevant instances from other datasets, such as their input-output pairs or contexts, based on their similarities with the given seed data, and then prompt LLMs to generate new samples with the contextual information within and across the original and retrieved samples. This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone. We validate our proposed Retrieval-Augmented Data Augmentation (RADA) framework on multiple datasets under low-resource settings of training and test-time data augmentation scenarios, on which it outperforms existing LLM-powered data augmentation baselines.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.13482

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

eXplainable Bayesian Multi-Perspective Generative Retrieval

Song, EuiYul, Oh, Philhoon, Kim, Sangryul, Thorne, James

arXiv.org Artificial IntelligenceFeb-4-2024

Modern deterministic retrieval pipelines prioritize achieving state-of-the-art performance but often lack interpretability in decision-making. These models face challenges in assessing uncertainty, leading to overconfident predictions. To overcome these limitations, we integrate uncertainty calibration and interpretability into a retrieval pipeline. Specifically, we introduce Bayesian methodologies and multi-perspective retrieval to calibrate uncertainty within a retrieval pipeline. We incorporate techniques such as LIME and SHAP to analyze the behavior of a black-box reranker model. The importance scores derived from these explanation methodologies serve as supplementary relevance scores to enhance the base reranker model. We evaluate the resulting performance enhancements achieved through uncertainty calibration and interpretable reranking on Question Answering and Fact Checking tasks. Our methods demonstrate substantial performance improvements across three KILT datasets.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.02418

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Oregon (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

Yang, Yongjin, Kim, Joonkee, Kim, Yujin, Ho, Namgyu, Thorne, James, Yun, Se-young

arXiv.org Artificial IntelligenceNov-22-2023

With the proliferation of social media, accurate detection of hate speech has become critical to ensure safety online. To combat nuanced forms of hate speech, it is important to identify and thoroughly explain hate speech to help users understand its harmful effects. Recent benchmarks have attempted to tackle this issue by training generative models on free-text annotations of implications in hateful text. However, we find significant reasoning gaps in the existing annotations schemes, which may hinder the supervision of detection models. In this paper, we introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill these gaps in explanations of hate speech, thus enabling effective supervision of detection models. Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines, using existing free-text human annotations. Analysis demonstrates that our method enhances the explanation quality of trained models and improves generalization to unseen datasets. Our code is available at https://github.com/joonkeekim/hare-hate-speech.git.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.00321

Country:

North America > United States (0.46)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Law Enforcement & Public Safety (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Disentangling Structure and Style: Political Bias Detection in News by Inducing Document Hierarchy

Hong, Jiwoo, Cho, Yejin, Jung, Jaemin, Han, Jiyoung, Thorne, James

arXiv.org Artificial IntelligenceOct-27-2023

We address an important gap in detecting political bias in news articles. Previous works that perform document classification can be influenced by the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust and style-agnostic approach to detecting political bias in news articles. We introduce a novel multi-head hierarchical attention model that effectively encodes the structure of long documents through a diverse ensemble of attention heads. While journalism follows a formalized rhetorical structure, the writing style may vary by news outlet. We demonstrate that our method overcomes this domain dependency and outperforms previous approaches for robustness and accuracy. Further analysis and human evaluation demonstrate the ability of our model to capture common discourse structures in journalism. Our code is available at: https://github.com/xfactlab/emnlp2023-Document-Hierarchy

machine learning, natural language, news article, (20 more...)

arXiv.org Artificial Intelligence

2304.02247

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.82)
Personal (0.67)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback