AITopics | Wang, Zifeng

Collaborating Authors

Wang, Zifeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Tan, Zhen, Yan, Jun, Hsu, I-Hung, Han, Rujun, Wang, Zifeng, Le, Long T., Song, Yiwen, Chen, Yanfei, Palangi, Hamid, Lee, George, Iyer, Anand, Chen, Tianlong, Liu, Huan, Lee, Chen-Yu, Pfister, Tomas

arXiv.org Artificial IntelligenceMar-11-2025

Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities-utterances, turns, and sessions-into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs' cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.

large language model, machine learning, reflective memory management, (17 more...)

arXiv.org Artificial Intelligence

2503.08026

Country:

Europe (0.68)
North America > United States > Hawaii (0.14)
North America > United States > Michigan (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Yin, Fan, Wang, Zifeng, Hsu, I-Hung, Yan, Jun, Jiang, Ke, Chen, Yanfei, Gu, Jindong, Le, Long T., Chang, Kai-Wei, Lee, Chen-Yu, Palangi, Hamid, Pfister, Tomas

arXiv.org Artificial IntelligenceMar-10-2025

Large language models (LLMs) have exhibited the ability to effectively utilize external tools to address user queries. However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls. We model the complicated function interactions in multi-turn cases with graph and design novel node operations to build reliable signature paths. Motivated by context distillation, when guiding the generation of positive and negative trajectories using a teacher model, we provide reference function call sequences as positive hints in context and contrastive, incorrect function calls as negative hints. Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery, surpassing the performance of the teacher model Gemini-1.5-pro-002 by a large margin in function calling.

large language model, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2503.07826

Country: North America > United States > California (0.46)

Genre: Research Report (0.65)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

STAR: Stability-Inducing Weight Perturbation for Continual Learning

Eskandar, Masih, Imtiaz, Tooba, Hill, Davin, Wang, Zifeng, Dy, Jennifer

arXiv.org Artificial IntelligenceMar-3-2025

Humans can naturally learn new and varying tasks in a sequential manner. Continual learning is a class of learning algorithms that updates its learned model as it sees new data (on potentially new tasks) in a sequence. A key challenge in continual learning is that as the model is updated to learn new tasks, it becomes susceptible to catastrophic forgetting, where knowledge of previously learned tasks is lost. A popular approach to mitigate forgetting during continual learning is to maintain a small buffer of previously-seen samples and to replay them during training. However, this approach is limited by the small buffer size, and while forgetting is reduced, it is still present. In this paper, we propose a novel loss function, STAR, that exploits the worst-case parameter perturbation that reduces the KL-divergence of model predictions with that of its local parameter neighborhood to promote stability and alleviate forgetting. STAR can be combined with almost any existing rehearsal-based method as a plug-and-play component. We empirically show that STAR consistently improves the performance of existing methods by up to 15% across varying baselines and achieves superior or competitive accuracy to that of state-of-the-art methods aimed at improving rehearsal-based continual learning.

artificial intelligence, machine learning, st ar, (16 more...)

arXiv.org Artificial Intelligence

2503.01595

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Parmar, Mihir, Liu, Xin, Goyal, Palash, Chen, Yanfei, Le, Long, Mishra, Swaroop, Mobahi, Hossein, Gu, Jindong, Wang, Zifeng, Nakhost, Hootan, Baral, Chitta, Lee, Chen-Yu, Pfister, Tomas, Palangi, Hamid

arXiv.org Artificial IntelligenceFeb-22-2025

Recent agent frameworks and inference-time algorithms often struggle with complex planning problems due to limitations in verifying generated plans or reasoning and varying complexity of instances within a single task. Many existing methods for these tasks either perform task-level verification without considering constraints or apply inference-time algorithms without adapting to instance-level complexity. To address these limitations, we propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents. Specifically, our approach proposes constraint-guided iterative verification to enhance performance of inference-time algorithms--Best of N, Tree-of-Thought, and REBASE. In PlanGEN framework, the selection agent optimizes algorithm choice based on instance complexity, ensuring better adaptability to complex planning problems. Experimental results demonstrate significant improvements over the strongest baseline across multiple benchmarks, achieving state-of-the-art results on NATURAL PLAN ($\sim$8%$\uparrow$), OlympiadBench ($\sim$4%$\uparrow$), DocFinQA ($\sim$7%$\uparrow$), and GPQA ($\sim$1%$\uparrow$). Our key finding highlights that constraint-guided iterative verification improves inference-time algorithms, and adaptive selection further boosts performance on complex planning and reasoning problems.

large language model, machine learning, plangen, (17 more...)

arXiv.org Artificial Intelligence

2502.16111

Country:

Asia > Thailand (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.86)

Add feedback

Universal Model Routing for Efficient LLM Inference

Jitkrittum, Wittawat, Narasimhan, Harikrishna, Rawat, Ankit Singh, Juneja, Jeevesh, Wang, Zifeng, Lee, Chen-Yu, Shenoy, Pradeep, Panigrahy, Rina, Menon, Aditya Krishna, Kumar, Sanjiv

arXiv.org Artificial IntelligenceFeb-12-2025

Large language models' significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail two effective strategies, relying on cluster-based routing and a learned cluster map respectively. We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors. Experiments on a range of public benchmarks show the effectiveness of the proposed strategies in routing amongst more than 30 unseen LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.08773

Country:

Asia (0.92)
North America > Mexico > Mexico City (0.14)
North America > United States > California (0.14)
North America > United States > Arizona (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When One LLM Drools, Multi-LLM Collaboration Rules

Feng, Shangbin, Ding, Wenxuan, Liu, Alisa, Wang, Zifeng, Shi, Weijia, Wang, Yike, Shen, Zejiang, Han, Xiaochuang, Lang, Hunter, Lee, Chen-Yu, Pfister, Tomas, Choi, Yejin, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceFeb-6-2025

This position paper argues that in many realistic (i.e., complex, contextualized, subjective) scenarios, one LLM is not enough to produce a reliable output. We challenge the status quo of relying solely on a single general-purpose LLM and argue for multi-LLM collaboration to better represent the extensive diversity of data, skills, and people. We first posit that a single LLM underrepresents real-world data distributions, heterogeneous skills, and pluralistic populations, and that such representation gaps cannot be trivially patched by further training a single LLM. We then organize existing multi-LLM collaboration methods into a hierarchy, based on the level of access and information exchange, ranging from API-level, text-level, logit-level, to weight-level collaboration. Based on these methods, we highlight how multi-LLM collaboration addresses challenges that a single LLM struggles with, such as reliability, democratization, and pluralism. Finally, we identify the limitations of existing multi-LLM methods and motivate future work. We envision multi-LLM collaboration as an essential path toward compositional intelligence and collaborative AI development.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.04506

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems

Feng, Shangbin, Wang, Zifeng, Goyal, Palash, Wang, Yike, Shi, Weijia, Xia, Huang, Palangi, Hamid, Zettlemoyer, Luke, Tsvetkov, Yulia, Lee, Chen-Yu, Pfister, Tomas

arXiv.org Artificial IntelligenceFeb-6-2025

We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. We represent multi-LLM systems as directed acyclic graphs (DAGs) of LLMs with topological message passing for collaborative generation. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as learning a DAG that specifies the flow of inputs and outputs between LLMs. Starting from a swarm of random continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order, evaluate on the utility function (e.g. accuracy on a task), and optimize the adjacency matrices with particle swarm optimization based on the utility score. For weight-step, we assess the contribution of individual LLMs in the multi-LLM systems and optimize model weights with swarm intelligence. We propose JFK-score to quantify the individual contribution of each LLM in the best-found DAG of the role-step, then optimize model weights with particle swarm optimization based on the JFK-score. Experiments demonstrate that Heterogeneous Swarms outperforms 15 role- and/or weight-based baselines by 18.5% on average across 12 tasks. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of language models.

evolutionary algorithm, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.0451

Genre: Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

A foundation model for human-AI collaboration in medical literature mining

Wang, Zifeng, Cao, Lang, Jin, Qiao, Chan, Joey, Wan, Nicholas, Afzali, Behdad, Cho, Hyun-Jin, Choi, Chang-In, Emamverdi, Mehdi, Gill, Manjot K., Kim, Sun-Hyung, Li, Yijia, Liu, Yi, Ong, Hanley, Rousseau, Justin, Sheikh, Irfan, Wei, Jenny J., Xu, Ziyang, Zallek, Christopher M., Kim, Kyungsang, Peng, Yifan, Lu, Zhiyong, Sun, Jimeng

arXiv.org Artificial IntelligenceJan-27-2025

Systematic literature review is essential for evidence-based medicine, requiring comprehensive analysis of clinical trial publications. However, the application of artificial intelligence (AI) models for medical literature mining has been limited by insufficient training and evaluation across broad therapeutic areas and diverse tasks. Here, we present LEADS, an AI foundation model for study search, screening, and data extraction from medical literature. The model is trained on 633,759 instruction data points in LEADSInstruct, curated from 21,335 systematic reviews, 453,625 clinical trial publications, and 27,015 clinical trial registries. We showed that LEADS demonstrates consistent improvements over four cutting-edge generic large language models (LLMs) on six tasks. Furthermore, LEADS enhances expert workflows by providing supportive references following expert requests, streamlining processes while maintaining high-quality results. A study with 16 clinicians and medical researchers from 14 different institutions revealed that experts collaborating with LEADS achieved a recall of 0.81 compared to 0.77 experts working alone in study selection, with a time savings of 22.6%. In data extraction tasks, experts using LEADS achieved an accuracy of 0.85 versus 0.80 without using LEADS, alongside a 26.9% time savings. These findings highlight the potential of specialized medical literature foundation models to outperform generic models, delivering significant quality and efficiency benefits when integrated into expert workflows for medical literature mining.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2501.16255

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Washington > King County > Seattle (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TableRAG: Million-Token Table Understanding with Language Models

Chen, Si-An, Miculicich, Lesly, Eisenschlos, Julian Martin, Wang, Zifeng, Wang, Zilong, Chen, Yanfei, Fujii, Yasuhisa, Lin, Hsuan-Tien, Lee, Chen-Yu, Pfister, Tomas

arXiv.org Artificial IntelligenceDec-26-2024

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale. Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding. The implementation and dataset will be available at https://github.com/

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.04739

Country: Asia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Wang, Zifeng, Wang, Hanyin, Danek, Benjamin, Li, Ying, Mack, Christina, Poon, Hoifung, Wang, Yajuan, Rajpurkar, Pranav, Sun, Jimeng

arXiv.org Artificial IntelligenceNov-29-2024

The integration of Large Language Models (LLMs) into medical applications has sparked widespread interest across the healthcare industry, from drug discovery and development to clinical decision support, assisting telemedicine, medical devices, and healthcare insurance applications. This perspective paper aims to discuss the inner workings of building LLM-powered medical AI applications and introduces a comprehensive framework for their development. We review existing literature and outline the unique challenges of applying LLMs in specialized medical contexts. Additionally, we introduce a three-step framework to organize medical LLM research activities: 1) Modeling: breaking down complex medical workflows into manageable steps for developing medical-specific models; 2) Optimization: optimizing the model performance with crafted prompts and integrating external knowledge and tools, and 3) System engineering: decomposing complex tasks into subtasks and leveraging human expertise for building medical AI applications. Furthermore, we offer a detailed use case playbook that describes various LLM-powered medical AI applications, such as optimizing clinical trial design, enhancing clinical decision support, and advancing medical imaging analysis. Finally, we discuss various challenges and considerations for building medical AI applications with LLMs, such as handling hallucination issues, data ownership and compliance, privacy, intellectual property considerations, compute cost, sustainability issues, and responsible AI requirements.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.00024

Country: North America > United States > Illinois > Champaign County (0.14)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback