AITopics | Seo, Minjoon

Collaborating Authors

Seo, Minjoon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Ye, Seonghyeon, Hwang, Hyeonbin, Yang, Sohee, Yun, Hyeongu, Kim, Yireun, Seo, Minjoon

arXiv.org Artificial IntelligenceDec-24-2023

In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. This implies that the instruction-following ability of LLMs can be improved during inference time with a fixed prompt constructed with simple heuristics. We hypothesize that TAPP assists language models to better estimate the output distribution by focusing more on the instruction of the target task during inference. In other words, such ability does not seem to be sufficiently activated in not only base LLMs but also many instruction-fine-tuned LLMs. All experiments are reproducible from https://github.com/seonghyeonye/TAPP.

demonstration, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.14691

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (1.00)
Government > Regional Government (0.93)
Education > Educational Setting (0.68)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Practicality of Generative Retrieval on Dynamic Corpora

Yoon, Soyoung, Kim, Chaeeun, Lee, Hyunji, Jang, Joel, Yang, Sohee, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-16-2023

Benchmarking the performance of information retrieval (IR) methods are mostly conducted with a fixed set of documents (static corpora); in realistic scenarios, this is rarely the case and the document to be retrieved are constantly updated and added. In this paper, we focus on conducting a comprehensive comparison between two categories of contemporary retrieval systems, Dual Encoders (DE) and Generative Retrievals (GR), in a dynamic scenario where the corpora to be retrieved is updated. We also conduct an extensive evaluation of computational and memory efficiency, crucial factors for IR systems for real-world deployment. Our results demonstrate that GR is more adaptable to evolving knowledge (+13-18% on the StreamingQA Benchmark), robust in handling data with temporal information (x 10 times), and efficient in terms of memory (x 4 times), indexing time (x 6 times), and inference flops (x 10 times). Our paper highlights GR's potential for future use in practical IR systems.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.18952

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine (0.94)
Government > Immigration & Customs (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

KTRL+F: Knowledge-Augmented In-Document Search

Oh, Hanseok, Shin, Haebin, Ko, Miyoung, Lee, Hyunji, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-16-2023

We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query. This task addresses following unique challenges for in-document search: 1) utilizing knowledge outside the document for extended use of additional information about targets to bridge the semantic gap between the query and the targets, and 2) balancing between real-time applicability with the performance. We analyze various baselines in KTRL+F and find there are limitations of existing models, such as hallucinations, low latency, or difficulties in leveraging external knowledge. Therefore we propose a Knowledge-Augmented Phrase Retrieval model that shows a promising balance between speed and performance by simply augmenting external knowledge embedding in phrase embedding. Additionally, we conduct a user study to verify whether solving KTRL+F can enhance search experience of users. It demonstrates that even with our simple model users can reduce the time for searching with less queries and reduced extra visits to other sources for collecting evidence. We encourage the research community to work on KTRL+F to enhance more efficient in-document information access.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.08329

Country:

Asia (0.71)
North America > United States > California > Santa Clara County > San Jose (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Law Enforcement & Public Safety (1.00)
Information Technology > Services (0.73)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders

Lee, Hyunji, Soldaini, Luca, Cohan, Arman, Seo, Minjoon, Lo, Kyle

arXiv.org Artificial IntelligenceNov-16-2023

Prevailing research practice today often relies on training dense retrievers on existing large datasets such as MSMARCO and then experimenting with ways to improve zero-shot generalization capabilities to unseen domains. While prior work has tackled this challenge through resource-intensive steps such as data augmentation, architectural modifications, increasing model size, or even further base model pretraining, comparatively little investigation has examined whether the training procedures themselves can be improved to yield better generalization capabilities in the resulting models. In this work, we recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives. We validate these recommendations using the BEIR benchmark and find results are persistent across choice of dense encoder and base model size and are complementary to other resource-intensive strategies for out-of-domain generalization such as architectural modifications or additional pretraining. We hope that this thorough and impartial study around various training techniques, which augments other resource-intensive methods, offers practical insights for developing a dense retrieval model that effectively generalizes, even when trained on a single dataset. Dense neural retrieval methods have been proven to be generally effective in many Information Retrieval (IR) tasks (Karpukhin et al., 2020; Izacard et al., 2021; Ni et al., 2021a). These methods use learned neural encoders to obtain dense vector representations of text and the relevance of passages for any given query is estimated by computing the dot product between their encodings. Dense approaches can outperform traditional retrieval techniques (e.g., BM25 (Robertson & Jones, 1976)), as they estimate similarity beyond syntactic matching (Lin et al., 2022). Neural retrieval models are effective rankers in domains for which large supervised datasets exist (e.g., MSMARCO (Campos et al., 2016) or Google NQ (Kwiatkowski et al., 2019)). Conversely, they might struggle to generalize to settings they have not been trained on, leading to challenges in handling out-ofdomain tasks (Thakur et al., 2021a; Ren et al., 2022; Lupart et al., 2023). In most real-world applications, supervision data is not available; whereas, retrieval models play a key role in the nascent field of augmented language models across many new exciting scenarios (Mialon et al., 2023).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.09765

Country:

Europe (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

How Well Do Large Language Models Truly Ground?

Lee, Hyunji, Joo, Sejune, Kim, Chaeeun, Jang, Joel, Kim, Doyoung, On, Kyoung-Woon, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-15-2023

Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge. To mitigate this, LLMs can be probed to generate responses by grounding on external context, often given as input (knowledge-augmented models). Yet, previous research is often confined to a narrow view of the term "grounding", often only focusing on whether the response contains the correct answer or not, which does not ensure the reliability of the entire response. To address this limitation, we introduce a strict definition of grounding: a model is considered truly grounded when its responses (1) fully utilize necessary knowledge from the provided context, and (2) don't exceed the knowledge within the contexts. We introduce a new dataset and a grounding metric to assess this new definition and perform experiments across 13 LLMs of different sizes and training methods to provide insights into the factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2311.09069

Country:

Asia > Middle East > Iraq (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Oil & Gas > Upstream (0.94)
Leisure & Entertainment > Sports (0.93)
Government > Regional Government (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision

Lee, Seongyun, Park, Sue Hyun, Jo, Yongrae, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-14-2023

Large multimodal models (LMMs) suffer from multimodal hallucination, where they provide incorrect responses misaligned with the given visual information. Recent works have conjectured that one of the reasons behind multimodal hallucination might be due to the vision encoder failing to ground on the image properly. To mitigate this issue, we propose a novel approach that leverages self-feedback as visual cues. Building on this approach, we introduce Volcano, a multimodal self-feedback guided revision model. Volcano generates natural language feedback to its initial response based on the provided visual information and utilizes this feedback to self-revise its initial response. Volcano effectively reduces multimodal hallucination and achieves state-of-the-art on MMHal-Bench, POPE, and GAVIE. It also improves on general multimodal abilities and outperforms previous models on MM-Vet and MMBench. Through a qualitative analysis, we show that Volcano's feedback is properly grounded on the image than the initial response. This indicates that Volcano can provide itself with richer visual information, helping alleviate multimodal hallucination. We publicly release Volcano models of 7B and 13B sizes along with the data and code at https://github.com/kaistAI/Volcano.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.07362

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Bayesian Approach To Analysing Training Data Attribution In Deep Learning

Nguyen, Elisa, Seo, Minjoon, Oh, Seong Joon

arXiv.org Artificial IntelligenceOct-31-2023

Training data attribution (TDA) techniques find influential training data for the model's prediction on the test data of interest. They approximate the impact of down- or up-weighting a particular training sample. While conceptually useful, they are hardly applicable to deep models in practice, particularly because of their sensitivity to different model initialisation. In this paper, we introduce a Bayesian perspective on the TDA task, where the learned model is treated as a Bayesian posterior and the TDA estimates as random variables. From this novel viewpoint, we observe that the influence of an individual training sample is often overshadowed by the noise stemming from model initialisation and SGD batch composition. Based on this observation, we argue that TDA can only be reliably used for explaining deep model predictions that are consistently influenced by certain training data, independent of other noise factors. Our experiments demonstrate the rarity of such noise-independent training-test data pairs but confirm their existence. We recommend that future researchers and practitioners trust TDA estimates only in such cases. Further, we find a disagreement between ground truth and estimated TDA distributions and encourage future work to study this gap. Code is provided at https://github.com/ElisaNguyen/bayesian-tda.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.19765

Country:

Europe > Croatia (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > Experimental Study (0.73)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.81)

Add feedback

Aligning Large Language Models through Synthetic Feedback

Kim, Sungdong, Bae, Sanghwan, Shin, Jamin, Kang, Soyoung, Kwak, Donghyun, Yoo, Kang Min, Seo, Minjoon

arXiv.org Artificial IntelligenceOct-20-2023

Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant human demonstrations and feedback or distillation from proprietary LLMs such as ChatGPT. In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprietary LLMs. First, we perform reward modeling (RM) with synthetic feedback by contrasting responses from vanilla LLMs with various sizes and prompts. Then, we use the RM to simulate high-quality demonstrations to train a supervised policy and further optimize the model with reinforcement learning. Our resulting model, Aligned Language Model with Synthetic Training dataset (ALMoST), outperforms recent open-sourced models, which are trained on the outputs of InstructGPT or human-annotated demonstrations, in alignment benchmarks. In human evaluation, our model is preferred to Alpaca and Dolly-v2, 55.0% and 58.5% of the time, respectively. Further analyses demonstrate the efficacy and importance of synthetic feedback in our framework. The code is available at https://github.com/naver-ai/almost

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.13735

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt

Ye, Seonghyeon, Jang, Joel, Kim, Doyoung, Jo, Yongrae, Seo, Minjoon

arXiv.org Artificial IntelligenceOct-16-2023

Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size. In this work, we explore how retrieval of soft prompts obtained through prompt tuning can efficiently assist hard prompts in zero-shot task generalization. Specifically, we train soft prompt embeddings for each prompt through prompt tuning, store the samples of the training instances mapped with the prompt embeddings, and retrieve the corresponding prompt embedding of the training instance closest to the query instance during inference. While only adding 0.007% additional parameters, retrieval of soft prompt enhances the performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39% points. Also, we report an interesting finding that retrieving source embeddings trained on similar answer choice formats is more important than those on similar task types.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.03029

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.97)

Add feedback

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Kim, Seungone, Joo, Se June, Kim, Doyoung, Jang, Joel, Ye, Seonghyeon, Shin, Jamin, Seo, Minjoon

arXiv.org Artificial IntelligenceOct-14-2023

Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Collection, which augments the existing Flan Collection (including only 9 CoT tasks) with additional 1.84 million rationales across 1,060 tasks. We show that CoT fine-tuning Flan-T5 (3B & 11B) with CoT Collection enables smaller LMs to have better CoT capabilities on unseen tasks. On the BIG-Bench-Hard (BBH) benchmark, we report an average improvement of +4.34% (Flan-T5 3B) and +2.60% (Flan-T5 11B), in terms of zero-shot task accuracy. Furthermore, we show that instruction tuning with CoT Collection allows LMs to possess stronger few-shot learning capabilities on 4 domain-specific tasks, resulting in an improvement of +2.24% (Flan-T5 3B) and +2.37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until the max length by a +13.98% margin. Our code, the CoT Collection data, and model checkpoints are publicly available.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.14045

Country: North America > United States (0.93)

Genre:

Research Report (0.82)
Personal > Interview (0.46)

Industry:

Leisure & Entertainment (0.68)
Law (0.68)
Education (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback