hexagon
WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate
Cherian, Anoop, Doyle, River, Ben-Dov, Eyal, Lohit, Suhas, Peng, Kuan-Chuan
Recent large language models (LLMs) are trained on diverse corpora and tasks, leading them to develop complementary strengths. Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning, though it has mostly been applied to language-only tasks, leaving its efficacy on multimodal problems underexplored. In this paper, we study MAD for solving vision-and-language reasoning problems. Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities. To this end, we present Weighted Iterative Society-of-Experts (WISE), a generalized and modular MAD framework that partitions the agents into Solvers, that generate solutions, and Reflectors, that verify correctness, assign weights, and provide natural language feedback. To aggregate the agents' solutions across debate rounds, while accounting for variance in their responses and the feedback weights, we present a modified Dawid-Skene algorithm for post-processing that integrates our two-stage debate model. We evaluate WISE on SMART-840, VisualPuzzles, EvoChart-QA, and a new SMART-840++ dataset with programmatically generated problem instances of controlled difficulty. Our results show that WISE consistently improves accuracy by 2-7% over the state-of-the-art MAD setups and aggregation methods across diverse multimodal tasks and LLM configurations.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A Hierarchical Hybrid AI Approach: Integrating Deep Reinforcement Learning and Scripted Agents in Combat Simulations
Black, Scotty, Darken, Christian
In the domain of combat simulations in support of wargaming, the development of intelligent agents has predominantly been characterized by rule-based, scripted methodologies with deep reinforcement learning (RL) approaches only recently being introduced. While scripted agents offer predictability and consistency in controlled environments, they fall short in dynamic, complex scenarios due to their inherent inflexibility. Conversely, RL agents excel in adaptability and learning, offering potential improvements in handling unforeseen situations, but suffer from significant challenges such as black-box decision-making processes and scalability issues in larger simulation environments. This paper introduces a novel hierarchical hybrid artificial intelligence (AI) approach that synergizes the reliability and predictability of scripted agents with the dynamic, adaptive learning capabilities of RL. By structuring the AI system hierarchically, the proposed approach aims to utilize scripted agents for routine, tactical-level decisions and RL agents for higher-level, strategic decision-making, thus addressing the limitations of each method while leveraging their individual strengths. This integration is shown to significantly improve overall performance, providing a robust, adaptable, and effective solution for developing and training intelligent agents in complex simulation environments.
- North America > United States > California > Monterey County > Monterey (0.04)
- North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
- North America > United States > Florida > Duval County > Jacksonville (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Infinite folds
But her passion is for paper--with no scissors. Today, she's a tessellation expert who teaches, invents new designs, and writes papers on the underlying math. Madonna Yoder '17 photographed in her studio Ross Mantle When Madonna Yoder '17 was eight years old, she learned how to fold a square piece of paper over and over and over again. After about 16 folds, she held a bird in her hands. The first time she pulled the tail of a flapping crane, she says, she realized: . That first piece was an origami classic, folded by kids at summer camp for generations and many people's first foray into the art form.
- North America > United States > Virginia (0.05)
- South America > Peru (0.05)
- North America > United States > Massachusetts (0.04)
- (4 more...)
Fold your own tessellation
Yoder recommends printing the pattern on paper in between normal printer paper and cardstock in weight, making sure it folds in straight lines (not too thick), folds back and forth easily on the same line (not too thin), and is crisp enough to make a satisfying snapping noise when you shake it. Her favorite paper isSkytone, which is commonly used to print certificates and fancy envelopes. Once you have your crease pattern on a sheet of paper, cut out the hexagon that contains the pattern. Yoder recommends using a straightedge and blade on a cutting mat instead of scissors, whether that means an X-Acto knife and a ruler on a sheet of cardboard or a quilting ruler and rotary cutter on a fabric cutting mat. The next step is folding the background grid of black lines that the pattern uses as references. Assuming you've cut out your hexagon precisely, you can use the edge of the hexagon and the printed lines to make your creases, or you can fold as if there were no lines printed by folding the hexagon in half (edge to opposite edge) and then folding those edges in to the center to make quarter lines, first in one direction and then in the other two.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.32)
Thinker: Learning to Think Fast and Slow
Chung, Stephen, Du, Wenyu, Fu, Jie
Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting in long, redundant responses and highlighting deficiencies in intuition and verification. Inspired by the Dual Process Theory in psychology, we introduce a simple modification to the QA task that includes four stages: Fast Thinking, where the LLM must answer within a strict token budget; Verification, where the model evaluates its initial response; Slow Thinking, where it refines the initial response with more deliberation; and Summarization, where it distills the refinement from the previous stage into precise steps. Our proposed task improves average accuracy from 25.6% to 27.3% for Qwen2.5-1.5B, and from 45.9% to 51.0% for DeepSeek-R1-Qwen-1.5B. Notably, for Qwen2.5-1.5B, the Fast Thinking mode alone achieves 25.2% accuracy using fewer than 1000 tokens, demonstrating substantial inference efficiency gains. These findings suggest that intuition and deliberative reasoning are distinct, complementary systems benefiting from targeted training. Additionally, we have open-sourced both the trained models and the source code.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Galichin, Andrey, Dontsov, Alexey, Druzhinina, Polina, Razzhigaev, Anton, Rogov, Oleg Y., Tutubalina, Elena, Oseledets, Ivan
Recent LLMs like DeepSeek-R1 have demonstrated state-of-the-art performance by integrating deep thinking and complex reasoning during generation. However, the internal mechanisms behind these reasoning processes remain unexplored. We observe reasoning LLMs consistently use vocabulary associated with human reasoning processes. We hypothesize these words correspond to specific reasoning moments within the models' internal mechanisms. To test this hypothesis, we employ Sparse Autoencoders (SAEs), a technique for sparse decomposition of neural network activations into human-interpretable features. We introduce ReasonScore, an automatic metric to identify active SAE features during these reasoning moments. We perform manual and automatic interpretation of the features detected by our metric, and find those with activation patterns matching uncertainty, exploratory thinking, and reflection. Through steering experiments, we demonstrate that amplifying these features increases performance on reasoning-intensive benchmarks (+2.2%) while producing longer reasoning traces (+20.5%). Using the model diffing technique, we provide evidence that these features are present only in models with reasoning capabilities. Our work provides the first step towards a mechanistic understanding of reasoning in LLMs. Code available at https://github.com/AIRI-Institute/SAE-Reasoning
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Liu, Yule, Zheng, Jingyi, Sun, Zhen, Peng, Zifan, Dong, Wenhan, Sha, Zeyang, Cui, Shiwen, Wang, Weiqiang, He, Xinlei
Recent advancements in large reasoning models (LRMs) have demonstrated the effectiveness of scaling test-time computation to enhance reasoning capabilities on various tasks. However, LRMs often suffer from an ``overthinking'' problem, where the model generates excessively redundant reasoning steps with limited performance gains. In this work, we empirically reveal an important characteristic of LRM behaviors that placing external CoTs generated by smaller models between the thinking token (\texttt{
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Job-killing robot learns at work, and it's coming to the factory floor
Industries can rethink how work gets done, raising the bar for productivity and workplace safety. Across industries, companies are feeling the squeeze from labor shortages, rising costs and nonstop pressure to boost efficiency. Robots are quickly becoming real-life solutions, and their promise has never felt more relevant. With factories and warehouses scrambling to fill essential roles, the search for fresh ideas is heating up. That's where AEON comes in.
- Information Technology (0.51)
- Media > News (0.32)
Validating remotely sensed biomass estimates with forest inventory data in the western US
Cao, Xiuyu, Sexton, Joseph O., Wang, Panshi, Gounaridis, Dimitrios, Carter, Neil H., Zhu, Kai
Monitoring aboveground biomass (AGB) and its density (AGBD) at high resolution is essential for carbon accounting and ecosystem management. While NASA's spaceborne Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission provides globally distributed reference measurements for AGBD estimation, the majority of commercial remote sensing products based on GEDI remain without rigorous or independent validation. Here, we present an independent regional validation of an AGBD dataset offered by terraPulse, Inc., based on independent reference data from the US Forest Service Forest Inventory and Analysis (FIA) program. Aggregated to 64,000-hectare hexagons and US counties across the US states of Utah, Nevada, and Washington, we found very strong agreement between terraPulse and FIA estimates. At the hexagon scale, we report R2 = 0.88, RMSE = 26.68 Mg/ha, and a correlation coefficient (r) of 0.94. At the county scale, agreement improves to R2 = 0.90, RMSE =32.62 Mg/ha, slope = 1.07, and r = 0.95. Spatial and statistical analyses indicated that terraPulse AGBD values tended to exceed FIA estimates in non-forest areas, likely due to FIA's limited sampling of non-forest vegetation. The terraPulse AGBD estimates also exhibited lower values in high-biomass forests, likely due to saturation effects in its optical remote-sensing covariates. This study advances operational carbon monitoring by delivering a scalable framework for comprehensive AGBD validation using independent FIA data, as well as a benchmark validation of a new commercial dataset for global biomass monitoring.
- North America > United States > Nevada (0.26)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Europe > Austria > Vienna (0.14)
- (10 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
Why do so many AI company logos look like buttholes?
Feedback is New Scientist's popular sideways look at the latest science and technology news. You can submit items you believe may amuse readers to Feedback by emailing feedback@newscientist.com The past few years have seen the emergence of a great many AI companies. This is extremely exciting/alarming (delete according to whether you bought shares early), but it has also had a secondary consequence. Along with the proliferation of AI companies has come a proliferation of AI company logos.
- Media > Music (0.75)
- Leisure & Entertainment (0.75)