South America
Optimal bounds for dissatisfaction in perpetual voting
Kozachinskiy, Alexander, Shen, Alexander, Steifer, Tomasz
In perpetual voting, multiple decisions are made at different moments in time. Taking the history of previous decisions into account allows us to satisfy properties such as proportionality over periods of time. In this paper, we consider the following question: is there a perpetual approval voting method that guarantees that no voter is dissatisfied too many times? We identify a sufficient condition on voter behavior -- which we call 'bounded conflicts' condition -- under which a sublinear growth of dissatisfaction is possible. We provide a tight upper bound on the growth of dissatisfaction under bounded conflicts, using techniques from Kolmogorov complexity. We also observe that the approval voting with binary choices mimics the machine learning setting of prediction with expert advice. This allows us to present a voting method with sublinear guarantees on dissatisfaction under bounded conflicts, based on the standard techniques from prediction with expert advice.
Interview with Andrews Ata Kangah: Localising illegal mining sites using machine learning and geospatial data
Andrews Ata Kangah is a team leader and researcher working on democratizing AI and AI solutions for environmental problems. We spoke to him about his research, attending the AfriClimate AI workshop at the Deep Learning Indaba, and what inspired him to work in AI and on climate-related projects. My name is Andrews Ata Kangah. I also double as a researcher at Armtos, which is a non-profit. At Armtos, our current goal is to build a solution to solve the illegal mining problem that's going on in Ghana. The mining is destroying the lands that are within mining areas.
Surrogate-assisted multi-objective design of complex multibody systems
Amakor, Augustina C., Berkemeier, Manuel B., Wohlleben, Meike, Sextro, Walter, Peitz, Sebastian
The optimization of large-scale multibody systems is a numerically challenging task, in particular when considering multiple conflicting criteria at the same time. In this situation, we need to approximate the Pareto set of optimal compromises, which is significantly more expensive than finding a single optimum in single-objective optimization. To prevent large costs, the usage of surrogate models, constructed from a small but informative number of expensive model evaluations, is a very popular and widely studied approach. The central challenge then is to ensure a high quality (that is, near-optimality) of the solutions that were obtained using the surrogate model, which can be hard to guarantee with a single pre-computed surrogate. We present a back-and-forth approach between surrogate modeling and multi-objective optimization to improve the quality of the obtained solutions. Using the example of an expensive-to-evaluate multibody system, we compare different strategies regarding multi-objective optimization, sampling and also surrogate modeling, to identify the most promising approach in terms of computational efficiency and solution quality.
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models
Pipatanakul, Kunat, Manakul, Potsawee, Nitarach, Natapong, Sirichotedumrong, Warit, Nonesung, Surapon, Jaknamon, Teetouch, Pengpun, Parinthapat, Taveekitworachai, Pittawat, Na-Thalang, Adisai, Sripaisarnmongkol, Sittipong, Jirayoot, Krisanapong, Tharnpipitchai, Kasima
This paper introduces Typhoon 2, a series of text and multimodal large language models optimized for the Thai language. The series includes models for text, vision, and audio. Typhoon2-Text builds on state-of-the-art open models, such as Llama 3 and Qwen2, and we perform continual pre-training on a mixture of English and Thai data. We employ post-training techniques to enhance Thai language performance while preserving the base models' original capabilities. We release text models across a range of sizes, from 1 to 70 billion parameters, available in both base and instruction-tuned variants. To guardrail text generation, we release Typhoon2-Safety, a classifier enhanced for Thai cultures and language. Typhoon2-Vision improves Thai document understanding while retaining general visual capabilities, such as image captioning. Typhoon2-Audio introduces an end-to-end speech-to-speech model architecture capable of processing audio, speech, and text inputs and generating both text and speech outputs.
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Wang, Ziteng, Chen, Jianfei, Zhu, Jun
Sparsely activated Mixture-of-Experts (MoE) models are widely adopted to scale up model capacity without increasing the computation budget. However, vanilla TopK routers are trained in a discontinuous, non-differentiable way, limiting their performance and scalability. To address this issue, we propose ReMoE, a fully differentiable MoE architecture that offers a simple yet effective drop-in replacement for the conventional TopK+Softmax routing, utilizing ReLU as the router instead. We further propose methods to regulate the router's sparsity while balancing the load among experts. ReMoE's continuous nature enables efficient dynamic allocation of computation across tokens and layers, while also exhibiting domain specialization. Our experiments demonstrate that ReMoE consistently outperforms vanilla TopK-routed MoE across various model sizes, expert counts, and levels of granularity. Furthermore, ReMoE exhibits superior scalability with respect to the number of experts, surpassing traditional MoE architectures. The implementation based on Megatron-LM is available at https://github.com/thu-ml/ReMoE. Transformer models (Vaswani, 2017) consistently improve performance as the number of parameters increases (Kaplan et al., 2020). However, scaling these models is constrained by computation resources. Sparsely activated Mixture-of-Experts (MoE) (Shazeer et al., 2017) mitigates this challenge by employing a sparse architecture that selectively activates a subset of parameters during both training and inference.
Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI
Krsek, Isadora, Kabra, Anubha, Dou, Yao, Naous, Tarek, Dabbish, Laura A., Ritter, Alan, Xu, Wei, Das, Sauvik
In pseudonymous online fora like Reddit, the benefits of self-disclosure are often apparent to users (e.g., I can vent about my in-laws to understanding strangers), but the privacy risks are more abstract (e.g., will my partner be able to tell that this is me?). Prior work has sought to develop natural language processing (NLP) tools that help users identify potentially risky self-disclosures in their text, but none have been designed for or evaluated with the users they hope to protect. Absent this assessment, these tools will be limited by the social-technical gap: users need assistive tools that help them make informed decisions, not paternalistic tools that tell them to avoid self-disclosure altogether. To bridge this gap, we conducted a study with N = 21 Reddit users; we had them use a state-of-the-art NLP disclosure detection model on two of their authored posts and asked them questions to understand if and how the model helped, where it fell short, and how it could be improved to help them make more informed decisions. Despite its imperfections, users responded positively to the model and highlighted its use as a tool that can help them catch mistakes, inform them of risks they were unaware of, and encourage self-reflection. However, our work also shows how, to be useful and usable, AI for supporting privacy decision-making must account for posting context, disclosure norms, and users' lived threat models, and provide explanations that help contextualize detected risks.
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Wong, Hallee E., Ortiz, Jose Javier Gonzalez, Guttag, John, Dalca, Adrian V.
Medical researchers and clinicians often need to perform novel segmentation tasks on a set of related images. Existing methods for segmenting a new dataset are either interactive, requiring substantial human effort for each image, or require an existing set of manually labeled images. We introduce a system, MultiverSeg, that enables practitioners to rapidly segment an entire new dataset without requiring access to any existing labeled data from that task or domain. Along with the image to segment, the model takes user interactions such as clicks, bounding boxes or scribbles as input, and predicts a segmentation. As the user segments more images, those images and segmentations become additional inputs to the model, providing context. As the context set of labeled images grows, the number of interactions required to segment each new image decreases. We demonstrate that MultiverSeg enables users to interactively segment new datasets efficiently, by amortizing the number of interactions per image to achieve an accurate segmentation. Compared to using a state-of-the-art interactive segmentation method, using MultiverSeg reduced the total number of scribble steps by 53% and clicks by 36% to achieve 90% Dice on sets of images from unseen tasks. We release code and model weights at https://multiverseg.csail.mit.edu
Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning
Frieder, Simon, Bayer, Jonas, Collins, Katherine M., Berner, Julius, Loader, Jacob, Juhรกsz, Andrรกs, Ruehle, Fabian, Welleck, Sean, Poesia, Gabriel, Griffiths, Ryan-Rhys, Weller, Adrian, Goyal, Anirudh, Lukasiewicz, Thomas, Gowers, Timothy
The suite of datasets commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings. These limitations include a restricted scope of mathematical complexity, typically not exceeding lower undergraduate-level mathematics, binary rating protocols and other issues, which makes comprehensive proof-based evaluation suites difficult. We systematically explore these limitations and contend that enhancing the capabilities of large language models, or any forthcoming advancements in AI-based mathematical assistants (copilots or "thought partners"), necessitates a paradigm shift in the design of mathematical datasets and the evaluation criteria of mathematical ability: It is necessary to move away from result-based datasets (theorem statement to theorem proof) and convert the rich facets of mathematical research practice to data LLMs can train on. Examples of these are mathematical workflows (sequences of atomic, potentially subfield-dependent tasks that are often performed when creating new mathematics), which are an important part of the proof-discovery process. Additionally, we advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P\'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal, alleviating some of the mentioned limitations. Lastly, we introduce math datasheets for datasets, extending the general, dataset-agnostic variants of datasheets: We provide a questionnaire designed specifically for math datasets that we urge dataset creators to include with their datasets. This will make creators aware of potential limitations of their datasets while at the same time making it easy for readers to assess it from the point of view of training and evaluating mathematical copilots.
Progressive Multimodal Reasoning via Active Retrieval
Dong, Guanting, Zhang, Chenghao, Deng, Mengjie, Zhu, Yutao, Dou, Zhicheng, Wen, Ji-Rong
Multi-step multimodal reasoning tasks pose significant challenges for multimodal large language models (MLLMs), and finding effective ways to enhance their performance in such scenarios remains an unresolved issue. In this paper, we propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs through Active Retrieval (AR) and Monte Carlo Tree Search (MCTS). Our approach begins with the development of a unified retrieval module that retrieves key supporting insights for solving complex reasoning problems from a hybrid-modal retrieval corpus. To bridge the gap in automated multimodal reasoning verification, we employ the MCTS algorithm combined with an active retrieval mechanism, which enables the automatic generation of step-wise annotations. This strategy dynamically retrieves key insights for each reasoning step, moving beyond traditional beam search sampling to improve the diversity and reliability of the reasoning space. Additionally, we introduce a process reward model that aligns progressively to support the automatic verification of multimodal reasoning tasks. Experimental results across three complex multimodal reasoning benchmarks confirm the effectiveness of the AR-MCTS framework in enhancing the performance of various multimodal models. Further analysis demonstrates that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
A Comprehensive Forecasting Framework based on Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment
Yang, Zhengchao, Ghosh, Mithun, Saha, Anish, Xu, Dong, Shmakov, Konstantin, Lee, Kuang-chih
Ads demand forecasting for Walmart's ad products plays a critical role in enabling effective resource planning, allocation, and management of ads performance. In this paper, we introduce a comprehensive demand forecasting system that tackles hierarchical time series forecasting in business settings. Though traditional hierarchical reconciliation methods ensure forecasting coherence, they often trade off accuracy for coherence especially at lower levels and fail to capture the seasonality unique to each time-series in the hierarchy. Thus, we propose a novel framework "Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment (Multi-Stage HiFoReAd)" to address the challenges of preserving seasonality, ensuring coherence, and improving accuracy. Our system first utilizes diverse models, ensembled through Bayesian Optimization (BO), achieving base forecasts. The generated base forecasts are then passed into the Multi-Stage HiFoReAd framework. The initial stage refines the hierarchy using Top-Down forecasts and "harmonic alignment." The second stage aligns the higher levels' forecasts using MinTrace algorithm, following which the last two levels undergo "harmonic alignment" and "stratified scaling", to eventually achieve accurate and coherent forecasts across the whole hierarchy. Our experiments on Walmart's internal Ads-demand dataset and 3 other public datasets, each with 4 hierarchical levels, demonstrate that the average Absolute Percentage Error from the cross-validation sets improve from 3% to 40% across levels against BO-ensemble of models (LGBM, MSTL+ETS, Prophet) as well as from 1.2% to 92.9% against State-Of-The-Art models. In addition, the forecasts at all hierarchical levels are proved to be coherent. The proposed framework has been deployed and leveraged by Walmart's ads, sales and operations teams to track future demands, make informed decisions and plan resources.