perimeter
- Education (1.00)
- Information Technology > Security & Privacy (0.94)
- Law (0.93)
WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate
Cherian, Anoop, Doyle, River, Ben-Dov, Eyal, Lohit, Suhas, Peng, Kuan-Chuan
Recent large language models (LLMs) are trained on diverse corpora and tasks, leading them to develop complementary strengths. Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning, though it has mostly been applied to language-only tasks, leaving its efficacy on multimodal problems underexplored. In this paper, we study MAD for solving vision-and-language reasoning problems. Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities. To this end, we present Weighted Iterative Society-of-Experts (WISE), a generalized and modular MAD framework that partitions the agents into Solvers, that generate solutions, and Reflectors, that verify correctness, assign weights, and provide natural language feedback. To aggregate the agents' solutions across debate rounds, while accounting for variance in their responses and the feedback weights, we present a modified Dawid-Skene algorithm for post-processing that integrates our two-stage debate model. We evaluate WISE on SMART-840, VisualPuzzles, EvoChart-QA, and a new SMART-840++ dataset with programmatically generated problem instances of controlled difficulty. Our results show that WISE consistently improves accuracy by 2-7% over the state-of-the-art MAD setups and aggregation methods across diverse multimodal tasks and LLM configurations.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
MFiSP: A Multimodal Fire Spread Prediction Framework
Sathiyamoorthy, Alec, Zhou, Wenhao, Zhou, Xiangmin, Li, Xiaodong, Gondal, Iqbal
The 2019-2020 Black Summer bushfires in Australia devastated 19 million hectares, destroyed 3,000 homes, and lasted seven months, demonstrating the escalating scale and urgency of wildfire threats requiring better forecasting for effective response. Traditional fire modeling relies on manual interpretation by Fire Behaviour Analysts (FBAns) and static environmental data, often leading to inaccuracies and operational limitations. Emerging data sources, such as NASA's FIRMS satellite imagery and Volunteered Geographic Information, offer potential improvements by enabling dynamic fire spread prediction. This study proposes a Multimodal Fire Spread Prediction Framework (MFiSP) that integrates social media data and remote sensing observations to enhance forecast accuracy. By adapting fuel map manipulation strategies between assimilation cycles, the framework dynamically adjusts fire behavior predictions to align with the observed rate of spread. We evaluate the efficacy of MFiSP using synthetically generated fire event polygons across multiple scenarios, analyzing individual and combined impacts on forecast perimeters. Results suggest that our MFiSP integrating multimodal data can improve fire spread prediction beyond conventional methods reliant on FBAn expertise and static inputs.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Colorado (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.58)
- Government (0.55)
- Information Technology (0.46)
Fire-EnSF: Wildfire Spread Data Assimilation using Ensemble Score Filter
Shi, Hongzheng, Wang, Yuhang, Liu, Xiao
As wildfires become increasingly destructive and expensive to control, effective management of active wildfires requires accurate, real-time fire spread predictions. To enhance the forecasting accuracy of active fires, data assimilation plays a vital role by integrating observations (such as remote-sensing data) and fire predictions generated from numerical models. This paper provides a comprehensive investigation on the application of a recently proposed diffusion-model-based filtering algorithm -- the Ensemble Score Filter (EnSF) -- to the data assimilation problem for real-time active wildfire spread predictions. Leveraging a score-based generative diffusion model, EnSF has been shown to have superior accuracy for high-dimensional nonlinear filtering problems, making it an ideal candidate for the filtering problems of wildfire spread models. Technical details are provided, and our numerical investigations demonstrate that EnSF provides superior accuracy, stability, and computational efficiency, establishing it as a robust and practical method for wildfire data assimilation. Our code has been made publicly available.
- North America > United States > Rocky Mountains (0.04)
- North America > Canada > Rocky Mountains (0.04)
- North America > United States > Nevada (0.04)
- (3 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Food & Agriculture > Agriculture (0.68)
- Energy (0.66)
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Li, Yifan, Chen, Zhenghao, Wu, Ziheng, Zhou, Kun, Luo, Ruipu, Zhang, Can, He, Zhentao, Zhan, Yufei, Zhao, Wayne Xin, Qiu, Minghui
Recent advances in inference-time scaling, particularly those leveraging reinforcement learning with verifiable rewards, have substantially enhanced the reasoning capabilities of Large Vision-Language Models (LVLMs). Inspired by this success, similar strategies have been applied to multimodal reasoning, yet their impact on visual perception remains unclear. To investigate this gap, we introduce DisTANCE, a perception-centric benchmark for visual estimation tasks. Evaluation results show that LVLMs exhibit limited estimation precision, and inference-time scaling offers only marginal gains. We attribute this to the fast perception paradigm of current LVLMs, where visual understanding is treated as a one-shot output without modeling the underlying perceptual process. To address this, we propose Perception-Time Scaling (PTS), a novel paradigm that encourages token-rich perception and decomposes complex perception problems into intermediate tractable sub-problems, thereby enabling perception to align with and benefit from inference-time scaling. Combined with reinforcement learning techniques, PTS significantly improves perception accuracy, raising high-precision performance on DisTANCE from 8.0% to 64.7%, and generalizes well to out-of-domain tasks. Surprisingly, even though PTS data are purely synthetic, combining them with math reasoning data yields consistent gains in both reasoning and real-world perception benchmarks. Further analysis reveals that PTS introduces more perception-related tokens and increases the model's attention to image tokens. Our code and data will be publicly released.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Galichin, Andrey, Dontsov, Alexey, Druzhinina, Polina, Razzhigaev, Anton, Rogov, Oleg Y., Tutubalina, Elena, Oseledets, Ivan
Recent LLMs like DeepSeek-R1 have demonstrated state-of-the-art performance by integrating deep thinking and complex reasoning during generation. However, the internal mechanisms behind these reasoning processes remain unexplored. We observe reasoning LLMs consistently use vocabulary associated with human reasoning processes. We hypothesize these words correspond to specific reasoning moments within the models' internal mechanisms. To test this hypothesis, we employ Sparse Autoencoders (SAEs), a technique for sparse decomposition of neural network activations into human-interpretable features. We introduce ReasonScore, an automatic metric to identify active SAE features during these reasoning moments. We perform manual and automatic interpretation of the features detected by our metric, and find those with activation patterns matching uncertainty, exploratory thinking, and reflection. Through steering experiments, we demonstrate that amplifying these features increases performance on reasoning-intensive benchmarks (+2.2%) while producing longer reasoning traces (+20.5%). Using the model diffing technique, we provide evidence that these features are present only in models with reasoning capabilities. Our work provides the first step towards a mechanistic understanding of reasoning in LLMs. Code available at https://github.com/AIRI-Institute/SAE-Reasoning
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Liu, Yule, Zheng, Jingyi, Sun, Zhen, Peng, Zifan, Dong, Wenhan, Sha, Zeyang, Cui, Shiwen, Wang, Weiqiang, He, Xinlei
Recent advancements in large reasoning models (LRMs) have demonstrated the effectiveness of scaling test-time computation to enhance reasoning capabilities on various tasks. However, LRMs often suffer from an ``overthinking'' problem, where the model generates excessively redundant reasoning steps with limited performance gains. In this work, we empirically reveal an important characteristic of LRM behaviors that placing external CoTs generated by smaller models between the thinking token (\texttt{
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Foundation's new season has dramatic potential – but sadly falls flat
Mel Brooks and Carl Reiner used to spend every evening watching movies. Their favourites were cheesy – the type of film where someone says, "Secure the perimeter!" Why do I mention this in the context of Foundation? Because this adaptation of Isaac Asimov's novels started out as a thought-provoking series, but is now a "Secure the perimeter!" It has been two years since Foundation last aired, so if you have forgotten where we left off, that is understandable.
- Media > Film (0.38)
- Leisure & Entertainment (0.38)
Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height
Shaddy, Bryan, Binder, Brianna, Dasgupta, Agnimitra, Qin, Haitong, Haley, James, Farguell, Angel, Hilburn, Kyle, Mallia, Derek V., Kochanski, Adam, Mandel, Jan, Oberai, Assad
Increasing wildfire occurrence has spurred growing interest in wildfire spread prediction. However, even the most complex wildfire models diverge from observed progression during multi-day simulations, motivating need for data assimilation. A useful approach to assimilating measurement data into complex coupled atmosphere-wildfire models is to estimate wildfire progression from measurements and use this progression to develop a matching atmospheric state. In this study, an approach is developed for estimating fire progression from VIIRS active fire measurements, GOES-derived ignition times, and terrain height data. A conditional Generative Adversarial Network is trained with simulations of historic wildfires from the atmosphere-wildfire model WRF-SFIRE, thus allowing incorporation of WRF-SFIRE physics into estimates. Fire progression is succinctly represented by fire arrival time, and measurements for training are obtained by applying an approximate observation operator to WRF-SFIRE solutions, eliminating need for satellite data during training. The model is trained on tuples of fire arrival times, measurements, and terrain, and once trained leverages measurements of real fires and corresponding terrain data to generate samples of fire arrival times. The approach is validated on five Pacific US wildfires, with results compared against high-resolution perimeters measured via aircraft, finding an average Sorensen-Dice coefficient of 0.81. The influence of terrain height on the arrival time inference is also evaluated and it is observed that terrain has minimal influence when the inference is conditioned on satellite measurements.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Colorado > Denver County > Denver (0.14)
- Europe > Finland (0.04)
- (4 more...)
Reinforce LLM Reasoning through Multi-Agent Reflection
Leveraging more test-time computation has proven to be an effective way to boost the reasoning capabilities of large language models (LLMs). Among various methods, the verify-and-improve paradigm stands out for enabling dynamic solution exploration and feedback incorporation. However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (Direct Policy Search by Dynamic Programming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. Theoretically, DPSDP can match the performance of any policy within the training distribution. Empirically, we instantiate DPSDP with various base models and show improvements on both in- and out-of-distribution benchmarks. For example, on benchmark MATH 500, majority voting over five refinement steps increases first-turn accuracy from 58.2% to 63.2% with Ministral-based models. An ablation study further confirms the benefits of multi-agent collaboration and out-of-distribution generalization.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.45)