AITopics | Park, Chanwoo

Collaborating Authors

Park, Chanwoo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Park, Chanwoo, Han, Seungju, Guo, Xingzhi, Ozdaglar, Asuman, Zhang, Kaiqing, Kim, Joo-Kyung

arXiv.org Artificial IntelligenceFeb-25-2025

Leveraging multiple large language models (LLMs) to build collaborative multi-agentic workflows has demonstrated significant potential. However, most previous studies focus on prompting the out-of-the-box LLMs, relying on their innate capability for collaboration, which may not improve LLMs' performance as shown recently. In this paper, we introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning), to explicitly elicit the collaborative behaviors and further unleash the power of multi-agentic LLM frameworks. In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer. In the end, a MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer, while adding incentives to encourage corrective and persuasive discussions. The score serves as the co-training reward, and is then maximized through multi-agent RL. Unlike existing LLM post-training paradigms, MAPoRL advocates the co-training of multiple LLMs together using RL for better generalization. Accompanied by analytical insights, our experiments demonstrate that training individual LLMs alone is insufficient to induce effective collaboration. In contrast, multi-agent co-training can boost the collaboration performance across benchmarks, with generalization to unseen domains.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.18439

Country:

North America > United States (0.14)
Asia > Thailand (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making

Kim, Yubin, Park, Chanwoo, Jeong, Hyewon, Grau-Vilchez, Cristina, Chan, Yik Siu, Xu, Xuhai, McDuff, Daniel, Lee, Hyeonhoon, Breazeal, Cynthia, Park, Hae Won

arXiv.org Artificial IntelligenceNov-19-2024

Medical Decision-Making (MDM) is a multi-faceted process that requires clinicians to assess complex multi-modal patient data patient, often collaboratively. Large Language Models (LLMs) promise to streamline this process by synthesizing vast medical knowledge and multi-modal health data. However, single-agent are often ill-suited for nuanced medical contexts requiring adaptable, collaborative problem-solving. Our MDAgents addresses this need by dynamically assigning collaboration structures to LLMs based on task complexity, mimicking real-world clinical collaboration and decision-making. This framework improves diagnostic accuracy and supports adaptive responses in complex, real-world medical scenarios, making it a valuable tool for clinicians in various healthcare settings, and at the same time, being more efficient in terms of computing cost than static multi-agent decision making methods.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2411.00248

Country: North America > United States > Massachusetts (0.17)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Diagnostic Medicine (0.90)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Park, Chanwoo, Liu, Mingyang, Kong, Dingwen, Zhang, Kaiqing, Ozdaglar, Asuman

arXiv.org Artificial IntelligenceMay-27-2024

Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due to the use of fewer data for learning each model by personalization). We then establish sample complexity guarantees for both approaches. For the latter, we aim to adhere to the single-model framework, as already deployed in the current RLHF paradigm, by carefully aggregating diverse and truthful preferences from humans. We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions. Under the probabilistic-opinion-feedback model, we also develop an approach to handle strategic human labelers who may bias and manipulate the aggregated preferences with untruthful feedback. Based on the ideas in mechanism design, our approach ensures truthful preference reporting, with the induced aggregation rule maximizing social welfare functions.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.00254

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Leisure & Entertainment (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Park, Chanwoo, Liu, Xiangyu, Ozdaglar, Asuman, Zhang, Kaiqing

arXiv.org Artificial IntelligenceMay-26-2024

Large language models (LLMs) have been increasingly employed for (interactive) decision-making, via the development of LLM-based autonomous agents. Despite their emerging successes, the performance of LLM agents in decision-making has not been fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world LLM-agent applications. To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of \emph{regret}. We first empirically study the {no-regret} behaviors of LLMs in canonical (non-stationary) online learning problems, as well as the emergence of equilibria when LLM agents interact through playing repeated games. We then provide some theoretical insights into the no-regret behaviors of LLM agents, under certain assumptions on the supervised pre-training and the rationality model of human decision-makers who generate the data. Notably, we also identify (simple) cases where advanced LLMs such as GPT-4 fail to be no-regret. To promote the no-regret behaviors, we propose a novel \emph{unsupervised} training loss of \emph{regret-loss}, which, in contrast to the supervised pre-training loss, does not require the labels of (optimal) actions. We then establish the statistical guarantee of generalization bound for regret-loss minimization, followed by the optimization guarantee that minimizing such a loss may automatically lead to known no-regret learning algorithms. Our further experiments demonstrate the effectiveness of our regret-loss, especially in addressing the above ``regrettable'' cases.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.16843

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Maryland (0.14)
Asia > Middle East > UAE (0.13)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Online (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Collaboration Strategy for LLMs in Medical Decision Making

Kim, Yubin, Park, Chanwoo, Jeong, Hyewon, Chan, Yik Siu, Xu, Xuhai, McDuff, Daniel, Breazeal, Cynthia, Park, Hae Won

arXiv.org Artificial IntelligenceApr-22-2024

Foundation models have become invaluable in advancing the medical field. Despite their promise, the strategic deployment of LLMs for effective utility in complex medical tasks remains an open question. Our novel framework, Medical Decision-making Agents (MDAgents) aims to address this gap by automatically assigning the effective collaboration structure for LLMs. Assigned solo or group collaboration structure is tailored to the complexity of the medical task at hand, emulating real-world medical decision making processes. We evaluate our framework and baseline methods with state-of-the-art LLMs across a suite of challenging medical benchmarks: MedQA, MedMCQA, PubMedQA, DDXPlus, PMC-VQA, Path-VQA, and MedVidQA, achieving the best performance in 5 out of 7 benchmarks that require an understanding of multi-modal medical reasoning. Ablation studies reveal that MDAgents excels in adapting the number of collaborating agents to optimize efficiency and accuracy, showcasing its robustness in diverse scenarios. We also explore the dynamics of group consensus, offering insights into how collaborative agents could behave in complex clinical team dynamics. Our code can be found at https://github.com/mitmedialab/MDAgents.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2404.15155

Country:

Asia (0.46)
North America > United States > Massachusetts (0.14)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.94)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

Hwang, Minyoung, Weihs, Luca, Park, Chanwoo, Lee, Kimin, Kembhavi, Aniruddha, Ehsani, Kiana

arXiv.org Artificial IntelligenceDec-14-2023

Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper, we present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences in complex environments. We use multi-objective reinforcement learning to train a single policy adaptable to a broad spectrum of preferences. We introduce three distinct methods to infer human preferences by leveraging different types of interactions: (1) human demonstrations, (2) preference feedback on trajectory comparisons, and (3) language instructions. We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR, demonstrating the ability to prompt agent behaviors to satisfy human preferences in various scenarios. Project page: https://promptable-behaviors.github.io

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2312.09337

Country:

North America > United States (0.14)
Europe > Greece (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

Park, Chanwoo, Zhang, Kaiqing, Ozdaglar, Asuman

arXiv.org Artificial IntelligenceJul-13-2023

We study a new class of Markov games (MGs), \textit{Multi-player Zero-sum Markov Games} with {\it Networked separable interactions} (MZNMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define an MZNMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as an MZNMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the {product of} per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted MZNMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for MZNMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2307.0947

Country:

North America > United States > Massachusetts (0.27)
North America > United States > Maryland (0.27)

Genre: Research Report (0.81)

Industry:

Banking & Finance (0.92)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Sampling-based Bayesian Inference with gradient uncertainty

Park, Chanwoo, Kim, Jae Myung, Ha, Seok Hyeon, Lee, Jungwoo

arXiv.org Artificial IntelligenceDec-8-2018

Deep neural networks(NNs) have achieved impressive performance, often exceed human performance on many computer vision tasks. However, one of the most challenging issues that still remains is that NNs are overconfident in their predictions, which can be very harmful when this arises in safety critical applications. In this paper, we show that predictive uncertainty can be efficiently estimated when we incorporate the concept of gradients uncertainty into posterior sampling. The proposed method is tested on two different datasets, MNIST for in-distribution confusing examples and notMNIST for out-of-distribution data. We show that our method is able to efficiently represent predictive uncertainty on both datasets.

bayesian inference, gradient uncertainty, neural network, (19 more...)

arXiv.org Artificial Intelligence

1812.03285

Country: North America > Canada (0.14)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
(2 more...)

Add feedback