Goto

Collaborating Authors

 Bak, JinYeong


Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity

arXiv.org Artificial Intelligence

The recent leap in AI capabilities, driven by big generative models, has sparked the possibility of achieving Artificial General Intelligence (AGI) and further triggered discussions on Artificial Superintelligence (ASI), a system surpassing all humans across all domains. This gives rise to the critical research question of: If we realize ASI, how do we align it with human values, ensuring it benefits rather than harms human society, a.k.a., the Superalignment problem. Despite ASI being regarded by many as solely a hypothetical concept, in this paper, we argue that superalignment is achievable and research on it should advance immediately, through simultaneous and alternating optimization of task competence and value conformity. We posit that superalignment is not merely a safeguard for ASI but also necessary for its realization. To support this position, we first provide a formal definition of superalignment rooted in the gap between capability and capacity and elaborate on our argument. Then we review existing paradigms, explore their interconnections and limitations, and illustrate a potential path to superalignment centered on two fundamental principles. We hope this work sheds light on a practical approach for developing the value-aligned next-generation AI, garnering greater benefits and reducing potential harms for humanity.


Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions

arXiv.org Artificial Intelligence

Traditional text-based human-AI interactions often adhere to a strict turn-taking approach. In this research, we propose a novel approach that incorporates overlapping messages, mirroring natural human conversations. Through a formative study, we observed that even in text-based contexts, users instinctively engage in overlapping behaviors like "A: Today I went to-" "B: yeah." To capitalize on these insights, we developed OverlapBot, a prototype chatbot where both AI and users can initiate overlapping. Our user study revealed that OverlapBot was perceived as more communicative and immersive than traditional turn-taking chatbot, fostering faster and more natural interactions. Our findings contribute to the understanding of design space for overlapping interactions. We also provide recommendations for implementing overlap-capable AI interactions to enhance the fluidity and engagement of text-based conversations.


The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

arXiv.org Artificial Intelligence

The emergence of large language models (LLMs) has sparkedthe discussion on Artificial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence. Though ASI is still hypothetical and far from current AI capabilities, existing alignment methods struggle to guide such advanced AI ensure its safety in the future. It is essential to discuss the alignment of such AI now. Superalignment, the alignment of AI at superhuman levels of capability systems with human values and safety requirements, aims to address two primary goals: scalability in supervision to provide high-quality guidance signals and robust governance to ensure alignment with human values. In this survey, we review the original scalable oversight problem and corresponding methods and potential solutions for superalignment. Specifically, we introduce the Figure 1: Challenges from the perspectives of supervision challenges and limitations of current alignment and governance. While supervision perspective paradigms in addressing the superalignment focuses on providing high-quality guidance signals for problem. Then we review scalable oversight enhancing system competence, governance perspective methods for superalignment. Finally, we discuss emphasizes aligning the behavior of advanced aI with the key challenges and propose pathways human values to prevent harmful outcomes.


Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation

arXiv.org Artificial Intelligence

Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training. However, a single measure risks misjudging rationale quality, leading the models to learn flawed reasoning patterns. To address this issue, we propose CREST (Consistency-driven Rationale Evaluation for Self-Training), a self-training framework that further evaluates each rationale through follow-up questions and leverages this evaluation to guide its training. Specifically, we introduce two methods: (1) filtering out rationales that frequently result in incorrect answers on follow-up questions and (2) preference learning based on mixed preferences from rationale evaluation results of both original and follow-up questions. Experiments on three question-answering datasets using open LLMs show that CREST not only improves the logical robustness and correctness of rationales but also improves reasoning abilities compared to previous self-training approaches.


KpopMT: Translation Dataset with Terminology for Kpop Fandom

arXiv.org Artificial Intelligence

While machines learn from existing corpora, humans have the unique capability to establish and accept new language systems. This makes human form unique language systems within social groups. Aligning with this, we focus on a gap remaining in addressing translation challenges within social groups, where in-group members utilize unique terminologies. We propose KpopMT dataset, which aims to fill this gap by enabling precise terminology translation, choosing Kpop fandom as an initiative for social groups given its global popularity. Expert translators provide 1k English translations for Korean posts and comments, each annotated with specific terminology within social groups' language systems. We evaluate existing translation systems including GPT models on KpopMT to identify their failure cases. Results show overall low scores, underscoring the challenges of reflecting group-specific terminologies and styles in translation. We make KpopMT publicly available.


MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control

arXiv.org Artificial Intelligence

As mental health issues globally escalate, there is a tremendous need for advanced digital support systems. We introduce MentalAgora, a novel framework employing large language models enhanced by interaction between multiple agents for tailored mental health support. This framework operates through three stages: strategic debating, tailored counselor creation, and response generation, enabling the dynamic customization of responses based on individual user preferences and therapeutic needs. We conduct experiments utilizing a high-quality evaluation dataset TherapyTalk crafted with mental health professionals, shwoing that MentalAgora generates expert-aligned and user preference-enhanced responses. Our evaluations, including experiments and user studies, demonstrate that MentalAgora aligns with professional standards and effectively meets user preferences, setting a new benchmark for digital mental health interventions.


Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture

arXiv.org Artificial Intelligence

Transformer-based models still face the structural limitation of fixed context length in processing long sequence input despite their effectiveness in various fields. While various external memory techniques were introduced, most previous techniques fail to avoid fateful forgetting, where even the most important memories are inevitably forgotten after a sufficient number of time steps. We designed Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories related to memory. Experimentally, we demonstrated the effectiveness of Memoria in tasks such as sorting and language modeling, surpassing conventional techniques.


Translating Hanja Historical Documents to Contemporary Korean and English

arXiv.org Artificial Intelligence

The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.


PEMA: Plug-in External Memory Adaptation for Language Models

arXiv.org Artificial Intelligence

Pre-trained language models (PLMs) have demonstrated impressive performance across various downstream NLP tasks. Nevertheless, the resource requirements of pre-training large language models in terms of memory and training compute pose significant challenges. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning on specific tasks. To overcome the limitations, we introduce Plug-in External Memory Adaptation (PEMA), a Parameter-Efficient Fine-Tuning (PEFT) approach designed for fine-tuning PLMs without the need for all weights. PEMA can be integrated into the context representation of test data during inference to execute downstream tasks. It leverages an external memory to store context representations generated by a PLM, mapped with the desired target word. Our method entails training LoRA-based weight matrices within the final layer of the PLM for enhanced efficiency. The probability is then interpolated with the next-word distribution from the PLM to perform downstream tasks. To improve the generation quality, we propose a novel interpolation strategy named Gradual Unrolling. To demonstrate the effectiveness of our proposed method, we conduct experiments to demonstrate the efficacy of PEMA with a syntactic dataset and assess its performance on machine translation and style transfer tasks using real datasets. PEMA outperforms other PEFT methods in terms of memory and latency efficiency for training and inference. Furthermore, it outperforms other baselines in preserving the meaning of sentences while generating appropriate language and styles.


From Values to Opinions: Predicting Human Behaviors and Stances Using Value-Injected Large Language Models

arXiv.org Artificial Intelligence

Being able to predict people's opinions on issues and behaviors in realistic scenarios can be helpful in various domains, such as politics and marketing. However, conducting large-scale surveys like the European Social Survey to solicit people's opinions on individual issues can incur prohibitive costs. Leveraging prior research showing influence of core human values on individual decisions and actions, we propose to use value-injected large language models (LLM) to predict opinions and behaviors. To this end, we present Value Injection Method (VIM), a collection of two methods -- argument generation and question answering -- designed to inject targeted value distributions into LLMs via fine-tuning. We then conduct a series of experiments on four tasks to test the effectiveness of VIM and the possibility of using value-injected LLMs to predict opinions and behaviors of people. We find that LLMs value-injected with variations of VIM substantially outperform the baselines. Also, the results suggest that opinions and behaviors can be better predicted using value-injected LLMs than the baseline approaches.