Optimization
A Smooth Transition Between Induction and Deduction: Fast Abductive Learning Based on Probabilistic Symbol Perception
Jia, Lin-Han, Han, Si-Yu, Guo, Lan-Zhe, Zhou, Zhi, Li, Zhao-Long, Li, Yu-Feng, Zhou, Zhi-Hua
Abductive learning (ABL) that integrates strengths of machine learning and logical reasoning to improve the learning generalization, has been recently shown effective. However, its efficiency is affected by the transition between numerical induction and symbolical deduction, leading to high computational costs in the worst-case scenario. Efforts on this issue remain to be limited. In this paper, we identified three reasons why previous optimization algorithms for ABL were not effective: insufficient utilization of prediction, symbol relationships, and accumulated experience in successful abductive processes, resulting in redundant calculations to the knowledge base. To address these challenges, we introduce an optimization algorithm named as Probabilistic Symbol Perception (PSP), which makes a smooth transition between induction and deduction and keeps the correctness of ABL unchanged. We leverage probability as a bridge and present an efficient data structure, achieving the transfer from a continuous probability sequence to discrete Boolean sequences with low computational complexity. Experiments demonstrate the promising results.
Eager Updates For Overlapped Communication and Computation in DiLoCo
Kale, Satyen, Douillard, Arthur, Donchev, Yanislav
Distributed optimization methods such as DiLoCo have been shown to be effective in training very large models across multiple distributed workers, such as datacenters. These methods split updates into two parts: an inner optimization phase, where the workers independently execute multiple optimization steps on their own local data, and an outer optimization step, where the inner updates are synchronized. While such approaches require orders of magnitude less communication than standard data-parallel training, in settings where the workers are datacenters, even the limited communication requirements of these approaches can still cause significant slow downs due to the blocking necessary at each outer optimization step. In this paper, we investigate techniques to mitigate this issue by overlapping communication with computation in a manner that allows the outer optimization step to fully overlap with the inner optimization phase. We show that a particular variant, dubbed eager updates, provides competitive performance with standard DiLoCo in settings with low bandwidth between workers.
Edge-Colored Clustering in Hypergraphs: Beyond Minimizing Unsatisfied Edges
Crane, Alex, Stanley, Thomas, Sullivan, Blair D., Veldt, Nate
Edge-colored clustering (ECC) is an optimization framework for clustering datasets characterized by categorical relationships among data points. The problem is formally encoded as an edge-colored hypergraph (Figure 1), where each edge represents an interaction between data objects (the nodes) and the color of the edge indicates the type or category of that interaction. The goal is to assign colors to nodes in such a way that edges of a color tend to include nodes of that color, by minimizing or maximizing some objective function relating edge colors and node colors. ECC algorithms have been applied to various clustering tasks where cluster labels naturally match with interaction types. For example, if nodes are researchers, edges are author lists for publications, and colors indicate publication field (computer science, biology, etc.), then ECC provides a framework for inferring researchers' fields based on publications. ECC has also been used for temporal hypergraph clustering [Amburg et al., 2020], where edge colors encode time windows in which interactions occur. ECC then clusters nodes into time windows in which they are especially active. Variants of ECC have also been used for team formation [Amburg et al., 2022], in which case nodes are people, edges represent team tasks, and colors indicate task type. In this setting, ECC corresponds to assigning tasks based on prior team experiences.
tn4ml: Tensor Network Training and Customization for Machine Learning
Puljak, Ema, Sanchez-Ramirez, Sergio, Masot-Llima, Sergi, Vallรจs-Muns, Jofre, Garcia-Saez, Artur, Pierini, Maurizio
Tensor Networks have emerged as a prominent alternative to neural networks for addressing Machine Learning challenges in foundational sciences, paving the way for their applications to real-life problems. This paper introduces tn4ml, a novel library designed to seamlessly integrate Tensor Networks into optimization pipelines for Machine Learning tasks. Inspired by existing Machine Learning frameworks, the library offers a user-friendly structure with modules for data embedding, objective function definition, and model training using diverse optimization strategies. We demonstrate its versatility through two examples: supervised learning on tabular data and unsupervised learning on an image dataset. Additionally, we analyze how customizing the parts of the Machine Learning pipeline for Tensor Networks influences performance metrics.
AIDE: AI-Driven Exploration in the Space of Code
Jiang, Zhengyao, Schmidt, Dominik, Srikanth, Dhruv, Xu, Dixing, Kaplan, Ian, Jacenko, Deniss, Wu, Yuxiang
Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-anderror as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI's MLE-Bench and METR's RE-Bench. The implementation of AIDE is publicly available at https://github.com/WecoAI/aideml.
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation
Aoyama, Tatsuya, Yang, Hanting, Hanada, Hiroyuki, Akahane, Satoshi, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro
Reducing the amount of training data while preserving model performance remains a fundamental challenge in machine learning. Dataset distillation seeks to generate synthetic instances that encapsulate the essential information of the original data [31]. This synthetic approach often proves more flexible and can potentially achieve greater data reduction than simply retaining subsets of actual instances. Such distilled datasets can also serve broader applications, for example by enabling efficient continual learning with reduced storage demands [14, 23, 3], and offering privacy safeguards through data corruption [2, 12]. Existing dataset distillation methods are essentially formulated as a bi-level optimization problem. This is because generating synthetic instances requires retraining the model with those instances as training data. Specifically, synthetic instances are created in the outer loop, and the model is trained in the inner loop, leading to high computational costs. A promising approach to avoid bi-level optimization is a method called Kernel Inducing Point (KIP) [18]. The KIP method avoids bi-level optimization by obtaining an analytical solution in its inner loop, effectively leveraging the fact that its loss function is a variant of squared loss.
Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts
Joshi, Sunay, Kiyani, Shayan, Pappas, George, Dobriban, Edgar, Hassani, Hamed
We consider the problem of conformal prediction under covariate shift. Given labeled data from a source domain and unlabeled data from a covariate shifted target domain, we seek to construct prediction sets with valid marginal coverage in the target domain. Most existing methods require estimating the unknown likelihood ratio function, which can be prohibitive for high-dimensional data such as images. To address this challenge, we introduce the likelihood ratio regularized quantile regression (LR-QR) algorithm, which combines the pinball loss with a novel choice of regularization in order to construct a threshold function without directly estimating the unknown likelihood ratio. We show that the LR-QR method has coverage at the desired level in the target domain, up to a small error term that we can control. Our proofs draw on a novel analysis of coverage via stability bounds from learning theory. Our experiments demonstrate that the LR-QR algorithm outperforms existing methods on high-dimensional prediction tasks, including a regression task for the Communities and Crime dataset, and an image classification task from the WILDS repository.
A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models
Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts. However, their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns. Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content. In this survey, we provide a structured overview of concept erasure, categorizing existing methods based on their optimization strategies and the architectural components they modify. We categorize concept erasure methods into fine-tuning for parameter updates, closed-form solutions for efficient edits, and inference-time interventions for content restriction without weight modification. Additionally, we explore adversarial attacks that bypass erasure techniques and discuss emerging defenses. To support further research, we consolidate key datasets, evaluation metrics, and benchmarks for assessing erasure effectiveness and model robustness. This survey serves as a comprehensive resource, offering insights into the evolving landscape of concept erasure, its challenges, and future directions.
Generative Topology Optimization: Exploring Diverse Solutions in Structural Design
Radler, Andreas, Volkmann, Eric, Brandstetter, Johannes, Berzins, Arturs
Topology optimization (TO) is a family of computational methods that derive near-optimal geometries from formal problem descriptions. Despite their success, established TO methods are limited to generating single solutions, restricting the exploration of alternative designs. To address this limitation, we introduce Generative Topology Optimization (GenTO) - a data-free method that trains a neural network to generate structurally compliant shapes and explores diverse solutions through an explicit diversity constraint. The network is trained with a solver-in-the-loop, optimizing the material distribution in each iteration. The trained model produces diverse shapes that closely adhere to the design requirements. We validate GenTO on 2D and 3D TO problems. Our results demonstrate that GenTO produces more diverse solutions than any prior method while maintaining near-optimality and being an order of magnitude faster due to inherent parallelism. These findings open new avenues for engineering and design, offering enhanced flexibility and innovation in structural optimization.
BalanceBenchmark: A Survey for Imbalanced Learning
Xu, Shaoxuan, Cui, Menglu, Huang, Chengxiang, Wang, Hongfa, DiHu, null
Multimodal learning has gained attention for its capacity to integrate information from different modalities. However, it is often hindered by the multimodal imbalance problem, where certain modality dominates while others remain underutilized. Although recent studies have proposed various methods to alleviate this problem, they lack comprehensive and fair comparisons. In this paper, we systematically categorize various mainstream multimodal imbalance algorithms into four groups based on the strategies they employ to mitigate imbalance. To facilitate a comprehensive evaluation of these methods, we introduce BalanceBenchmark, a benchmark including multiple widely used multidimensional datasets and evaluation metrics from three perspectives: performance, imbalance degree, and complexity. To ensure fair comparisons, we have developed a modular and extensible toolkit that standardizes the experimental workflow across different methods. Based on the experiments using BalanceBenchmark, we have identified several key insights into the characteristics and advantages of different method groups in terms of performance, balance degree and computational complexity. We expect such analysis could inspire more efficient approaches to address the imbalance problem in the future, as well as foundation models. The code of the toolkit is available at https://github.com/GeWu-Lab/BalanceBenchmark.