AITopics

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Neural Information Processing SystemsJun-13-2026, 23:03:07 GMT

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Multi-agent AI systems powered by large language models (LLMs) are increasingly applied to solve complex tasks. However, these systems often rely on fragile, manually designed prompts and heuristics, making optimization difficult. A key challenge in optimizing multi-agent systems is acquiring suitable training data for specialized agents. We introduce SiriuS, a self-improving, reasoning-driven optimization framework for multi-agent systems. Central to our approach is the construction of an experience library: a repository of high-quality reasoning trajectories. The library is built by retaining reasoning steps that lead to successful outcomes, providing a robust training set for optimizing multi-agent system. Additionally, we introduce a library augmentation procedure that refines unsuccessful trajectories, further enriching the library. SiriuS boosts performance by 2.86% to 21.88% on reasoning and biomedical QA and enhances agent negotiation in competitive settings. Our results show that SiriuS enhances multi-agent performance while generating reusable data for self-correction and self-play enhancement in the future.

artificial intelligence, name change, proceedings, (5 more...)

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Neural Information Processing SystemsFeb-10-2026, 02:59:53 GMT

2ae6b2bdf3a179e3e24129e2c54bd871-Paper-Conference.pdf

original performance 0, performance 0, performance aal, (16 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Neural Information Processing SystemsDec-24-2025, 15:01:12 GMT

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs

With the blossom of large language models (LLM), inference efficiency becomes increasingly important. Various approximate methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without significant performance degradation. However, after a comprehensive evaluation of contextual sparsity methods on various complex generation tasks, we find that although CS succeeds in prompt-understanding tasks, it significantly degrades the model performance for reasoning, deduction, and knowledge-based tasks. Despite the gap in end-to-end accuracy, we observed that sparse models and original models often share the general problem-solving logic and require only a few token corrections to recover the original model performance. This paper introduces SIRIUS, an efficient correction mechanism, which significantly boosts CS models on reasoning tasks while maintaining its efficiency gain. SIRIUS is evaluated on 6 models with 8 difficult generation tasks in reasoning, deduction, and coding and shows consistent effectiveness and efficiency. Also, we carefully develop a system implementation for SIRIUS and show that SIRIUS delivers theoretical latency reduction with roughly a 20% reduction in latency for 8B model on-chip and a 35% reduction in latency for 70B model offloading.

contexual sparisty, large language model, natural language, (13 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Neural Information Processing SystemsOct-9-2025, 21:48:34 GMT

2ae6b2bdf3a179e3e24129e2c54bd871-Paper-Conference.pdf

original performance 0, performance 0, performance aal, (15 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

arXiv.org Artificial IntelligenceJul-21-2025

Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?

Mei, Zhiting, Zhang, Christina, Yin, Tenny, Lidard, Justin, Shorinwa, Ola, Majumdar, Anirudha

Reasoning language models have set state-of-the-art (SOTA) records on many challenging benchmarks, enabled by multi-step reasoning induced using reinforcement learning. However, like previous language models, reasoning models are prone to generating confident, plausible responses that are incorrect (hallucinations). Knowing when and how much to trust these models is critical to the safe deployment of reasoning models in real-world applications. To this end, we explore uncertainty quantification of reasoning models in this work. Specifically, we ask three fundamental questions: First, are reasoning models well-calibrated? Second, does deeper reasoning improve model calibration? Finally, inspired by humans' innate ability to double-check their thought processes to verify the validity of their answers and their confidence, we ask: can reasoning models improve their calibration by explicitly reasoning about their chain-of-thought traces? We introduce introspective uncertainty quantification (UQ) to explore this direction. In extensive evaluations on SOTA reasoning models across a broad range of benchmarks, we find that reasoning models: (i) are typically overconfident, with self-verbalized confidence estimates often greater than 85% particularly for incorrect responses, (ii) become even more overconfident with deeper reasoning, and (iii) can become better calibrated through introspection (e.g., o3-Mini and DeepSeek R1) but not uniformly (e.g., Claude 3.7 Sonnet becomes more poorly calibrated). Lastly, we conclude with important research directions to design necessary UQ benchmarks and improve the calibration of reasoning models.

large language model, machine learning, natural language, (19 more...)

2506.18183

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

Neural Information Processing SystemsMay-26-2025, 19:44:05 GMT

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs

With the blossom of large language models (LLM), inference efficiency becomes increasingly important. Various approximate methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without significant performance degradation. However, after a comprehensive evaluation of contextual sparsity methods on various complex generation tasks, we find that although CS succeeds in prompt-understanding tasks, it significantly degrades the model performance for reasoning, deduction, and knowledge-based tasks. Despite the gap in end-to-end accuracy, we observed that sparse models and original models often share the general problem-solving logic and require only a few token corrections to recover the original model performance. This paper introduces SIRIUS, an efficient correction mechanism, which significantly boosts CS models on reasoning tasks while maintaining its efficiency gain.

artificial intelligence, large language model, natural language, (12 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

arXiv.org Artificial IntelligenceFeb-7-2025

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Zhao, Wanjia, Yuksekgonul, Mert, Wu, Shirley, Zou, James

Multi-agent AI systems powered by large language models (LLMs) are increasingly applied to solve complex tasks. However, these systems often rely on fragile, manually designed prompts and heuristics, making optimization difficult. A key challenge in optimizing multi-agent systems is acquiring suitable training data for specialized agents. We introduce SiriuS, a self-improving, reasoning-driven optimization framework for multi-agent systems. Central to our approach is the construction of an experience library: a repository of high-quality reasoning trajectories. The library is built by retaining reasoning steps that lead to successful outcomes, providing a robust training set for optimizing multi-agent system. Additionally, we introduce a library augmentation procedure that refines unsuccessful trajectories, further enriching the library. SiriuS boosts performance by 2.86\% to 21.88\% on reasoning and biomedical QA and enhances agent negotiation in competitive settings. Our results show that SiriuS enhances multi-agent performance while generating reusable data for self-correction and self-play enhancement in the future.

artificial intelligence, deep learning, machine learning, (14 more...)

2502.0478

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

arXiv.org Artificial IntelligenceSep-5-2024

Sirius: Contextual Sparsity with Correction for Efficient LLMs

Zhou, Yang, Chen, Zhuoming, Xu, Zhaozhuo, Lin, Victoria, Chen, Beidi

With the blossom of large language models (LLMs), inference efficiency becomes increasingly important. Various approximation methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without quality degradation. However, after a comprehensive evaluation of contextual sparsity methods on various complex generation tasks, we find that although CS succeeds in prompt-understanding tasks, CS significantly degrades the model performance for reasoning, deduction, and knowledge-based tasks. Despite the gap in end-to-end accuracy, we observed that sparse models often share general problem-solving logic and require only a few token corrections to recover the original model performance. This paper introduces Sirius, an efficient correction mechanism, which significantly recovers CS models quality on reasoning tasks while maintaining its efficiency gain. Sirius is evaluated on 6 models with 8 difficult generation tasks in reasoning, math, and coding and shows consistent effectiveness and efficiency. Also, we carefully develop a system implementation for Sirius and show that Sirius achieves roughly 20% reduction in latency for 8B model on-chip and 35% reduction for 70B model offloading. We open-source our implementation of Sirius at https://github.com/Infini-AI-Lab/Sirius.git.

contextual sparsity, correction, efficient llm, (1 more...)

2409.03856

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

arXiv.org Artificial IntelligenceSep-12-2022

GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Feng, Gilbert, Zhang, Hongbo, Li, Zhongyu, Peng, Xue Bin, Basireddy, Bhuvan, Yue, Linzhu, Song, Zhitao, Yang, Lizhi, Liu, Yunhui, Sreenath, Koushil, Levine, Sergey

Recent years have seen a surge in commercially-available and affordable quadrupedal robots, with many of these platforms being actively used in research and industry. As the availability of legged robots grows, so does the need for controllers that enable these robots to perform useful skills. However, most learning-based frameworks for controller development focus on training robot-specific controllers, a process that needs to be repeated for every new robot. In this work, we introduce a framework for training generalized locomotion (GenLoco) controllers for quadrupedal robots. Our framework synthesizes general-purpose locomotion controllers that can be deployed on a large variety of quadrupedal robots with similar morphologies. We present a simple but effective morphology randomization method that procedurally generates a diverse set of simulated robots for training. We show that by training a controller on this large set of simulated robots, our models acquire more general control strategies that can be directly transferred to novel simulated and real-world robots with diverse morphologies, which were not observed during training.

artificial intelligence, machine learning, robot, (16 more...)