Goto

Collaborating Authors

 work


ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Wan, Ziyu, Li, Yunxiang, Song, Yan, Wang, Hanjing, Yang, Linyi, Schmidt, Mark, Wang, Jun, Zhang, Weinan, Hu, Shuyue, Wen, Ying

arXiv.org Artificial Intelligence

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.


NatureLM: Deciphering the Language of Nature for Scientific Discovery

Xia, Yingce, Jin, Peiran, Xie, Shufang, He, Liang, Cao, Chuan, Luo, Renqian, Liu, Guoqing, Wang, Yue, Liu, Zequn, Chen, Yuan-Jyue, Guo, Zekun, Bai, Yeqi, Deng, Pan, Min, Yaosen, Lu, Ziheng, Hao, Hongxia, Yang, Han, Li, Jielan, Liu, Chang, Zhang, Jia, Zhu, Jianwei, Wu, Kehan, Zhang, Wei, Gao, Kaiyuan, Pei, Qizhi, Wang, Qian, Liu, Xixian, Li, Yanting, Zhu, Houtian, Lu, Yeqing, Ma, Mingqian, Wang, Zun, Xie, Tian, Maziarz, Krzysztof, Segler, Marwin, Yang, Zhao, Chen, Zilong, Shi, Yu, Zheng, Shuxin, Wu, Lijun, Hu, Chen, Dai, Peggy, Liu, Tie-Yan, Liu, Haiguang, Qin, Tao

arXiv.org Artificial Intelligence

Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, and RNA. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the "language of nature", we introduce Nature Language Model (briefly, NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) achieving state-of-the-art performance in tasks like SMILES-to-IUPAC translation and retrosynthesis on USPTO-50k. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases.


MetaSC: Test-Time Safety Specification Optimization for Language Models

Gallego, Víctor

arXiv.org Artificial Intelligence

We propose a novel dynamic safety framework that optimizes language model (LM) safety reasoning at inference time without modifying model weights. Building on recent advances in self-critique methods, our approach leverages a meta-critique mechanism that iteratively updates safety prompts--termed specifications--to drive the critique and revision process adaptively. This test-time optimization not only improves performance against adversarial jailbreak requests but also in diverse general safety-related tasks, such as avoiding moral harm or pursuing honest responses. Our empirical evaluations across several language models demonstrate that dynamically optimized safety prompts yield significantly higher safety scores compared to fixed system prompts and static self-critique defenses. Figure 1: Schematic overview of the proposed meta-critique process, MetaSC.


Arvind Krishna Celebrates the Work of a Pioneer at the TIME100 AI Impact Awards

TIME - Tech

Arvind Krishna, CEO, chairman and president of IBM, used his acceptance speech at the TIME100 AI Impact Awards on Monday to acknowledge pioneering computer scientist and mathematician Claude Shannon, calling him one of the "unsung heroes of today." Krishna, who accepted his award at a ceremony in Dubai alongside musician Grimes, California Institute of Technology professor Anima Anandkumar, and artist Refik Anadol, said of Shannon, "He would come up with the ways that you can convey information, all of which has stood the test until today." In 1948, Shannon--now known as the father of the information age--published "A Mathematical Theory of Communication," a transformative paper that, by proposing a simplified way of quantifying information via bits, would go on to fundamentally shape the development of information technology--and thus, our modern era. In his speech, Krishna also pointed to Shannon's work building robotic mice that solved mazes as an example of his enjoyment of play within his research. Krishna, of course, has some familiarity with what it takes to be at the cutting edge.


RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Tang, Zhengyang, Li, Ziniu, Xiao, Zhenyang, Ding, Tian, Sun, Ruoyu, Wang, Benyou, Liu, Dayiheng, Huang, Fei, Liu, Tianyu, Yu, Bowen, Lin, Junyang

arXiv.org Artificial Intelligence

Critiques are important for enhancing the performance of Large Language Models (LLMs), enabling both self-improvement and constructive feedback for others by identifying flaws and suggesting improvements. However, evaluating the critique capabilities of LLMs presents a significant challenge due to the open-ended nature of the task. In this work, we introduce a new benchmark designed to assess the critique capabilities of LLMs. Unlike existing benchmarks, which typically function in an open-loop fashion, our approach employs a closed-loop methodology that evaluates the quality of corrections generated from critiques. Moreover, the benchmark incorporates features such as self-critique, cross-critique, and iterative critique, which are crucial for distinguishing the abilities of advanced reasoning models from more classical ones. We implement this benchmark using eight challenging reasoning tasks. We have several interesting findings. First, despite demonstrating comparable performance in direct chain-of-thought generation, classical LLMs significantly lag behind the advanced reasoning-based model o1-mini across all critique scenarios. Second, in self-critique and iterative critique settings, classical LLMs may even underperform relative to their baseline capabilities. We hope that this benchmark will serve as a valuable resource to guide future advancements. The code and data are available at \url{https://github.com/tangzhy/RealCritic}.


FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Xu, Zhenran, Wang, Longyue, Wang, Jifang, Li, Zhouyi, Shi, Senbao, Yang, Xue, Wang, Yiyu, Hu, Baotian, Yu, Jun, Zhang, Min

arXiv.org Artificial Intelligence

Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions. Motivated by recent advances in automated decision-making with language agent-based societies, this paper introduces FilmAgent, a novel LLM-based multi-agent collaborative framework for end-to-end film automation in our constructed 3D virtual spaces. FilmAgent simulates various crew roles, including directors, screenwriters, actors, and cinematographers, and covers key stages of a film production workflow: (1) idea development transforms brainstormed ideas into structured story outlines; (2) scriptwriting elaborates on dialogue and character actions for each scene; (3) cinematography determines the camera setups for each shot. A team of agents collaborates through iterative feedback and revisions, thereby verifying intermediate scripts and reducing hallucinations. We evaluate the generated videos on 15 ideas and 4 key aspects. Human evaluation shows that FilmAgent outperforms all baselines across all aspects and scores 3.98 out of 5 on average, showing the feasibility of multi-agent collaboration in filmmaking. Further analysis reveals that FilmAgent, despite using the less advanced GPT-4o model, surpasses the single-agent o1, showing the advantage of a well-coordinated multi-agent system. Lastly, we discuss the complementary strengths and weaknesses of OpenAI's text-to-video model Sora and our FilmAgent in filmmaking.


XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners

Luo, Yun, Yang, Zhen, Meng, Fandong, Li, Yingjie, Guo, Fang, Qi, Qinglin, Zhou, Jie, Zhang, Yue

arXiv.org Artificial Intelligence

Active learning aims to construct an effective training set by iteratively curating the most informative unlabeled data for annotation, which is practical in low-resource tasks. Most active learning techniques in classification rely on the model's uncertainty or disagreement to choose unlabeled data. However, previous work indicates that existing models are poor at quantifying predictive uncertainty, which can lead to over-confidence in superficial patterns and a lack of exploration. Inspired by the cognitive processes in which humans deduce and predict through causal information, we propose a novel Explainable Active Learning framework (XAL) for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Specifically, besides using a pre-trained bi-directional encoder for classification, we employ a pre-trained uni-directional decoder to generate and score the explanation. A ranking loss is proposed to enhance the decoder's capability in scoring explanations. During the selection of unlabeled data, we combine the predictive uncertainty of the encoder and the explanation score of the decoder to acquire informative data for annotation. As XAL is a general framework for text classification, we test our methods on six different classification tasks. Extensive experiments show that XAL achieves substantial improvement on all six tasks over previous AL methods. Ablation studies demonstrate the effectiveness of each component, and human evaluation shows that the model trained in XAL performs surprisingly well in explaining its prediction.


AI-powered cruise control can stop 'phantom traffic jams' before they start

FOX News

FOX Business correspondent Lydia Hu has the latest on jobs at risk as AI further develops on'America's Newsroom.' The only thing worse than being stuck in a traffic jam is being stuck in a traffic jam that shouldn't be there. "Phantom Jams" are those backups that occur on highways for seemingly no reason, then dissipate as mysteriously as they appeared. They're usually started by drivers who suddenly brake or change lanes in dense traffic, which is followed by a wave of bad decisions made by the drivers behind. It escalates as more cars arrive at high speeds and have to slow down abruptly.


Beyond ChatGPT: The Future Of AI At Work

#artificialintelligence

ChatGPT's beta launch exceeded 1 million users in less than a week, attracting the attention of almost everyone in the entire tech ecosystem. I read articles about it in the New York Times, the Financial Times and The Atlantic, three top media sources in my books. The AI garners work-place buzz under the possibility that its generation is so effective, it might pose a threat to human jobs such as copywriting, answering customer service inquiries, writing news reports, and creating legal documents. Large Language Models (LLMs), and generative AI like ChatGPT to the workplace--especially where the reliability of information is paramount. I met with the executive team at Hebbia AI, a startup leading research efforts on LLMs, to dig in.


How Does ChatGPT Actually Work?

#artificialintelligence

Comcast is one of the email service providers and you can log-in by using credentials such as username and password. However, while using it, many users face Comcast email, not working problems. Here we discussed How To Resolve Xfinity Comcast Email Not Working Problems easily by going through the steps. This article helps you to get rid of several issues associated with Xfinity Comcast Email Not Working Issues. Many of them might be first-time users of Comcast.