Energy
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Tang, Zinan, Gao, Xin, Pei, Qizhi, Pan, Zhuoshi, Cai, Mengzhang, Wu, Jiang, He, Conghui, Wu, Lijun
Supervised Fine-Tuning (SFT) Large Language Models (LLM) fundamentally rely on high-quality training data. While data selection and data synthesis are two common strategies to improve data quality, existing approaches often face limitations in static dataset curation that fail to adapt to evolving model capabilities. In this paper, we introduce Middo, a self-evolving Model-informed dynamic data optimization framework that uses model-aware data selection and context-preserving data refinement. Unlike conventional one-off filtering/synthesis methods, our framework establishes a closed-loop optimization system: (1) A self-referential diagnostic module proactively identifies suboptimal samples through tri-axial model signals - loss patterns (complexity), embedding cluster dynamics (diversity), and self-alignment scores (quality); (2) An adaptive optimization engine then transforms suboptimal samples into pedagogically valuable training points while preserving semantic integrity; (3) This optimization process continuously evolves with model capability through dynamic learning principles. Experiments on multiple benchmarks demonstrate that our Middo consistently enhances the quality of seed data and boosts LLM's performance with improving accuracy by 7.15% on average while maintaining the original dataset scale. This work establishes a new paradigm for sustainable LLM training through dynamic human-AI co-evolution of data and models. Our datasets, models, and code are publicly available at https://github.com/Word2VecT/Middo.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Wang, Fali, Liu, Hui, Dai, Zhenwei, Zeng, Jingying, Zhang, Zhiwei, Wu, Zongyu, Luo, Chen, Li, Zhen, Tang, Xianfeng, He, Qi, Wang, Suhang
Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world problems are multi-stage complex tasks, composed of a sequence of heterogeneous subtasks with each subtask requires LLM of specific capability. Therefore, we study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks, aiming to select suitable models and allocate budgets per subtask to maximize overall performance. TTS in multi-stage tasks introduces two fundamental challenges: (i) The combinatorial search space of model and budget allocations, combined with the high cost of inference, makes brute-force search impractical. (ii) The optimal model and budget allocations across subtasks are interdependent, increasing the complexity of the compute-optimal search. To address this gap, we conduct extensive pilot experiments on four tasks across six datasets, deriving three empirical insights characterizing the behavior of LLMs in multi-stage complex tasks. Informed by these insights, we propose AgentTTS, an LLM-agent-based framework that autonomously searches for compute-optimal allocations through iterative feedback-driven interactions with the execution environment. Experimental results demonstrate that AgentTTS significantly outperforms traditional and other LLM-based baselines in search efficiency, and shows improved robustness to varying training set sizes and enhanced interpretability.
Comparing Uniform Price and Discriminatory Multi-Unit Auctions through Regret Minimization
Potfer, Marius, Perchet, Vianney
Repeated multi-unit auctions, where a seller allocates multiple identical items over many rounds, are common mechanisms in electricity markets and treasury auctions. We compare the two predominant formats: uniform-price and discriminatory auctions, focusing on the perspective of a single bidder learning to bid against stochastic adversaries. We characterize the learning difficulty in each format, showing that the regret scales similarly for both auction formats under both full-information and bandit feedback, as $\tildeฮ ( \sqrt{T} )$ and $\tildeฮ ( T^{2/3} )$, respectively. However, analysis beyond worst-case regret reveals structural differences: uniform-price auctions may admit faster learning rates, with regret scaling as $\tildeฮ ( \sqrt{T} )$ in settings where discriminatory auctions remain at $\tildeฮ ( T^{2/3} )$. Finally, we provide a specific analysis for auctions in which the other participants are symmetric and have unit-demand, and show that in these instances, a similar regret rate separation appears.
Essential Gear for an Emergency Kit--for Cars or Go-Bags
What Should Be in Your Emergency Kit Before Disaster Strikes? We consulted preparedness experts and WIRED's team of testers on the essential bug-out gear to keep your family safe during an unplanned exit. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. You never know when you're going to have to bug out on short notice.
New Report Finds Efforts to Slow Climate Change Are Working--Just Not Fast Enough
By virtually every key metric, efforts to fight climate change are going too slowly, according to findings by a coalition of climate groups. In some cases, things are moving in the wrong direction. An eroded iceberg is seen is seen floating near Horseshoe Island, Antarctica. In the 10 years since the signing of the Paris Agreement, the backbone of international climate action, humanity has made impressive progress. Renewable energy is increasingly cheap and reliable, while electric vehicles are becoming better every year.
Joint Optimization of Cooperation Efficiency and Communication Covertness for Target Detection with AUVs
Zhang, Xueyao, Yang, Bo, Yu, Zhiwen, Cao, Xuelin, Xiang, Wei, Guo, Bin, Wang, Liang, Lau, Billy Pik Lik, Alexandropoulos, George C., Luo, Jun, Debbah, Mรฉrouane, Han, Zhu, Yuen, Chau
This paper investigates underwater cooperative target detection using autonomous underwater vehicles (AUVs), with a focus on the critical trade-off between cooperation efficiency and communication covertness. To tackle this challenge, we first formulate a joint trajectory and power control optimization problem, and then present an innovative hierarchical action management framework to solve it. According to the hierarchical formulation, at the macro level, the master AUV models the agent selection process as a Markov decision process and deploys the proximal policy optimization algorithm for strategic task allocation. At the micro level, each selected agent's decentralized decision-making is modeled as a partially observable Markov decision process, and a multi-agent proximal policy optimization algorithm is used to dynamically adjust its trajectory and transmission power based on its local observations. Under the centralized training and decentralized execution paradigm, our target detection framework enables adaptive covert cooperation while satisfying both energy and mobility constraints. By comprehensively modeling the considered system, the involved signals and tasks, as well as energy consumption, theoretical insights and practical solutions for the efficient and secure operation of multiple AUVs are provided, offering significant implications for the execution of underwater covert communication tasks.
Automated urban waterlogging assessment and early warning through a mixture of foundation models
Zhang, Chenxu, Huang, Fuxiang, Zhang, Lei
With climate change intensifying, urban waterlogging poses an increasingly severe threat to global public safety and infrastructure. However, existing monitoring approaches rely heavily on manual reporting and fail to provide timely and comprehensive assessments. In this study, we present Urban Waterlogging Assessment (UWAssess), a foundation model-driven framework that automatically identif ies waterlogged areas in surveillance images and generates structured assessment reports. To address the scarcity of labeled data, we design a semi-supervised fine-tuning strategy and a chain-of-thought (CoT) prompting strategy to unleash the potential of the foundation model for data-scarce downstream tasks. Evaluations on challenging visual benchmarks demonstrate substantial improvements in perception performance . GPT-based evaluations confirm the ability of UWAssess to generate reliable textual reports that accurately describe waterlogging extent, depth, risk and impact. This dual capability enables a shift of waterlogging monitoring from perception to generation, while the collaborative framework of multiple foundation models lays the groundwork for intelligent and scalable systems, supporting urban management, disaster response and climate resilience.
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
Shen, Yingli, Lai, Wen, Wang, Shuo, Gao, Ge, Luo, Kangyang, Fraser, Alexander, Sun, Maosong
Continued pretraining and instruction tuning on large-scale multilingual data have proven to be effective in scaling large language models (LLMs) to low-resource languages. However, the unaligned nature of such data limits its ability to effectively capture cross-lingual semantics. In contrast, multi-way parallel data, where identical content is aligned across multiple languages, provides stronger cross-lingual consistency and offers greater potential for improving multilingual performance. In this paper, we introduce a large-scale, high-quality multi-way parallel corpus, TED2025, based on TED Talks. The corpus spans 113 languages, with up to 50 languages aligned in parallel, ensuring extensive multilingual coverage. Using this dataset, we investigate best practices for leveraging multi-way parallel data to enhance LLMs, including strategies for continued pretraining, instruction tuning, and the analysis of key influencing factors. Experiments on six multilingual benchmarks show that models trained on multiway parallel data consistently outperform those trained on unaligned multilingual data.
MTRE: Multi-Token Reliability Estimation for Hallucination Detection in VLMs
Zollicoffer, Geigh, Vu, Minh, Bhattarai, Manish
Vision-language models (VLMs) now rival human performance on many multimodal tasks, yet they still hallucinate objects or generate unsafe text. Current hallucination detectors, e.g., single-token linear probing (LP) and PTrue, typically analyze only the logit of the first generated token or just its highest-scoring component, overlooking richer signals embedded within earlier token distributions. We demonstrate that analyzing the complete sequence of early logits potentially provides substantially more diagnostic information. We emphasize that hallucinations may only emerge after several tokens, as subtle inconsistencies accumulate over time. By analyzing the Kullback-Leibler (KL) divergence between logits corresponding to hallucinated and non-hallucinated tokens, we underscore the importance of incorporating later-token logits to more accurately capture the reliability dynamics of VLMs. In response, we introduce Multi-Token Reliability Estimation (MTRE), a lightweight, white-box method that aggregates logits from the first ten tokens using multi-token log-likelihood ratios and self-attention. Despite the challenges posed by large vocabulary sizes and long logit sequences, MTRE remains efficient and tractable. Across MAD-Bench, MM-SafetyBench, MathVista, and four compositional-geometry benchmarks, MTRE achieves a 9.4% gain in accuracy and a 14.8% gain in AUROC over standard detection methods, establishing a new state of the art in hallucination detection for open-source VLMs.
Predicting butterfly species presence from satellite imagery using soft contrastive regularisation
van der Plas, Thijs L, Law, Stephen, Pocock, Michael JO
The growing demand for scalable biodiversity monitoring methods has fuelled interest in remote sensing data, due to its widespread availability and extensive coverage. Traditionally, the application of remote sensing to biodiversity research has focused on mapping and monitoring habitats, but with increasing availability of large-scale citizen-science wildlife observation data, recent methods have started to explore predicting multi-species presence directly from satellite images. This paper presents a new data set for predicting butterfly species presence from satellite data in the United Kingdom. W e experimentally optimise a Resnet-based model to predict multi-species presence from 4-band satellite images, and find that this model especially outperforms the mean rate baseline for locations with high species biodiversity. T o improve performance, we develop a soft, supervised contrastive regularisation loss that is tailored to probabilistic labels (such as species-presence data), and demonstrate that this improves prediction accuracy. In summary, our new data set and contrastive regularisation method contribute to the open challenge of accurately predicting species biodiversity from remote sensing data, which is key for efficient biodiversity monitoring.