AITopics

From implicit differentiation to probabilistic modeling, Jacobians and Hessians have many potential use cases in Machine Learning (ML), but conventional wisdom views them as computationally prohibitive. Fortunately, these matrices often exhibit sparsity, which can be leveraged to significantly speed up the process of Automatic Differentiation (AD). This paper presents advances in Automatic Sparse Differentiation (ASD), starting with a new perspective on sparsity detection. Our refreshed exposition is based on operator overloading, able to detect both local and global sparsity patterns, and naturally avoids dead ends in the control flow graph. We also describe a novel ASD pipeline in Julia, consisting of independent software packages for sparsity detection, matrix coloring, and differentiation, which together enable ASD based on arbitrary AD backends. Our pipeline is fully automatic and requires no modification of existing code, making it compatible with existing ML codebases. We demonstrate that this pipeline unlocks Jacobian and Hessian matrices at scales where they were considered too expensive to compute. On real-world problems from scientific ML and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, our ASD pipeline often outperforms standard AD for one-off computations, once thought impractical due to slower sparsity detection methods.

machine learning, programming language, sparsity pattern, (19 more...)

2501.17737

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Berlin (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Cao, Tianyu, Chen, Xiaokai, Scutari, Gesualdo

DCatalyst: A Unified Accelerated Framework for Decentralized Optimization

We study decentralized optimization over a network of agents, modeled as graphs, with no central server. The goal is to minimize $f+r$, where $f$ represents a (strongly) convex function averaging the local agents' losses, and $r$ is a convex, extended-value function. We introduce DCatalyst, a unified black-box framework that integrates Nesterov acceleration into decentralized optimization algorithms. %, enhancing their performance. At its core, DCatalyst operates as an \textit{inexact}, \textit{momentum-accelerated} proximal method (forming the outer loop) that seamlessly incorporates any selected decentralized algorithm (as the inner loop). We demonstrate that DCatalyst achieves optimal communication and computational complexity (up to log-factors) across various decentralized algorithms and problem instances. Notably, it extends acceleration capabilities to problem classes previously lacking accelerated solution methods, thereby broadening the effectiveness of decentralized methods. On the technical side, our framework introduce the {\it inexact estimating sequences}--a novel extension of the well-known Nesterov's estimating sequences, tailored for the minimization of composite losses in decentralized settings. This method adeptly handles consensus errors and inexact solutions of agents' subproblems, challenges not addressed by existing models.

algorithm, artificial intelligence, machine learning, (18 more...)

2501.18114

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Differentiable Projection-based Learn to Optimize in Wireless Network-Part I: Convex Constrained (Non-)Convex Programming

Wang, Xiucheng, Zhao, Xuan, Cheng, Nan

This paper addresses a class of (non-)convex optimization problems subject to general convex constraints, which pose significant challenges for traditional methods due to their inherent non-convexity and diversity. Conventional convex optimization-based solvers often struggle to efficiently handle these problems in their most general form. While neural network (NN)-based approaches offer a promising alternative, ensuring the feasibility of NN-generated solutions and effectively training the NN remain key hurdles, largely because finite-capacity networks can produce infeasible outputs. To overcome these issues, we propose a projection-based method that projects any infeasible NN output onto the feasible domain, thus guaranteeing strict adherence to the constraints without compromising the NN's optimization capability. Furthermore, we derive the objective function values for both the raw NN outputs and their projected counterparts, along with the gradients of these values with respect to the NN parameters. This derivation enables label-free (unsupervised) training, reducing reliance on labeled data and improving scalability. Experimental results demonstrate that the proposed projection-based method consistently ensures feasibility.

artificial intelligence, machine learning, optimization problem, (19 more...)

2502.00053

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (0.69)
Telecommunications (0.68)
Information Technology (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kübler, Jonas M, Wang, Yu-Xiang, Sabach, Shoham, Ansari, Navid, Kleindessner, Matthäus, Budhathoki, Kailash, Cevher, Volkan, Karypis, George

A Proximal Operator for Inducing 2:4-Sparsity

Recent hardware advancements in AI Accelerators and GPUs allow to efficiently compute sparse matrix multiplications, especially when 2 out of 4 consecutive weights are set to zero. However, this so-called 2:4 sparsity usually comes at a decreased accuracy of the model. We derive a regularizer that exploits the local correlation of features to find better sparsity masks in trained models. We minimize the regularizer jointly with a local squared loss by deriving the proximal operator for which we show that it has an efficient solution in the 2:4-sparse case. After optimizing the mask, we use maskedgradient updates to further minimize the local squared loss. We illustrate our method on toy problems and apply it to pruning entire large language models up to 70B parameters. On models up to 13B we improve over previous state of the art algorithms, whilst on 70B models we match their performance.

large language model, machine learning, natural language, (16 more...)

2501.18015

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Vu, Minh Nhat, Grander, Florian, Nguyen, Anh

Online Trajectory Replanner for Dynamically Grasping Irregular Objects

This paper presents a new trajectory replanner for grasping irregular objects. Unlike conventional grasping tasks where the object's geometry is assumed simple, we aim to achieve a "dynamic grasp" of the irregular objects, which requires continuous adjustment during the grasping process. To effectively handle irregular objects, we propose a trajectory optimization framework that comprises two phases. Firstly, in a specified time limit of 10s, initial offline trajectories are computed for a seamless motion from an initial configuration of the robot to grasp the object and deliver it to a pre-defined target location. Secondly, fast online trajectory optimization is implemented to update robot trajectories in real-time within 100 ms. This helps to mitigate pose estimation errors from the vision system. To account for model inaccuracies, disturbances, and other non-modeled effects, trajectory tracking controllers for both the robot and the gripper are implemented to execute the optimal trajectories from the proposed framework. The intensive experimental results effectively demonstrate the performance of our trajectory planning framework in both simulation and real-world scenarios.

artificial intelligence, optimization problem, trajectory, (17 more...)

2501.17968

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Piazza, Marco, Spinelli, Andrea, Maggioni, Francesca, Bedoni, Marzia, Messina, Enza

A Robust Support Vector Machine Approach for Raman COVID-19 Data Classification

Recent advances in healthcare technologies have led to the availability of large amounts of biological samples across several techniques and applications. In particular, in the last few years, Raman spectroscopy analysis of biological samples has been successfully applied for early-stage diagnosis. However, spectra' inherent complexity and variability make the manual analysis challenging, even for domain experts. For the same reason, the use of traditional Statistical and Machine Learning (ML) techniques could not guarantee for accurate and reliable results. ML models, combined with robust optimization techniques, offer the possibility to improve the classification accuracy and enhance the resilience of predictive models. In this paper, we investigate the performance of a novel robust formulation for Support Vector Machine (SVM) in classifying COVID-19 samples obtained from Raman Spectroscopy. Given the noisy and perturbed nature of biological samples, we protect the classification process against uncertainty through the application of robust optimization techniques. Specifically, we derive robust counterpart models of deterministic formulations using bounded-by-norm uncertainty sets around each observation. We explore the cases of both linear and kernel-induced classifiers to address binary and multiclass classification tasks. The effectiveness of our approach is validated on real-world COVID-19 datasets provided by Italian hospitals by comparing the results of our simulations with a state-of-the-art classifier.

artificial intelligence, machine learning, optimization problem, (18 more...)

2501.17904

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
Europe > Switzerland (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Cenikj, Gjorgjina, Petelin, Gašper, Seiler, Moritz, Cenikj, Nikola, Eftimov, Tome

Landscape Features in Single-Objective Continuous Optimization: Have We Hit a Wall in Algorithm Selection Generalization?

Motivated by the potential to capitalize on the varied performance of different algorithms across sets of different problem instances, the algorithm selection (AS) task targets the automated identification of a preferred optimization algorithm to solve a particular problem instance Kotthoff [2016], Kerschke et al. [2019]. Conventionally, AS is performed by taking into account the properties of the problem instance, which are typically described in the form of a numerical vector representation, also referred to as problem landscape features. Once a problem instance is represented in a vector form, Machine Learning (ML) models can be used to capture the relation between problem landscape features and algorithm performance, and further identify the best algorithm for a problem instance. In the field of single-objective continuous optimization, the most common choice of problem landscape features used to represent problem instances are the Exploratory Landscape Analysis (ELA) [Mersmann et al., 2011] features.

algorithm, artificial intelligence, machine learning, (17 more...)

2501.17663

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
Europe > Croatia > Primorje-Gorski Kotar County > Rijeka (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jabs, Christoph, Berg, Jeremias, Bogaerts, Bart, Järvisalo, Matti

Certifying Pareto-Optimality in Multi-Objective Maximum Satisfiability

Due to the wide employment of automated reasoning in the analysis and construction of correct systems, the results reported by automated reasoning engines must be trustworthy. For Boolean satisfiability (SAT) solvers - and more recently SAT-based maximum satisfiability (MaxSAT) solvers - trustworthiness is obtained by integrating proof logging into solvers, making solvers capable of emitting machine-verifiable proofs to certify correctness of the reasoning steps performed. In this work, we enable for the first time proof logging based on the VeriPB proof format for multi-objective MaxSAT (MO-MaxSAT) optimization techniques. Although VeriPB does not offer direct support for multi-objective problems, we detail how preorders in VeriPB can be used to provide certificates for MO-MaxSAT algorithms computing a representative solution for each element in the non-dominated set of the search space under Pareto-optimality, without extending the VeriPB format or the proof checker. By implementing VeriPB proof logging into a state-of-the-art multi-objective MaxSAT solver, we show empirically that proof logging can be made scalable for MO-MaxSAT with reasonable overhead.

artificial intelligence, constraint, logic & formal reasoning, (18 more...)

2501.17493

Country:

Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
(22 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)

Hölscher, Nils, Hakert, Christian, von der Brüggen, Georg, Chen, Jian-Jia, Chen, Kuan-Hsun, Reineke, Jan

WCDT: Systematic WCET Optimization for Decision Tree Implementations

Machine-learning models are increasingly deployed on resource-constrained embedded systems with strict timing constraints. In such scenarios, the worst-case execution time (WCET) of the models is required to ensure safe operation. Specifically, decision trees are a prominent class of machine-learning models and the main building blocks of tree-based ensemble models (e.g., random forests), which are commonly employed in resource-constrained embedded systems. In this paper, we develop a systematic approach for WCET optimization of decision tree implementations. To this end, we introduce a linear surrogate model that estimates the execution time of individual paths through a decision tree based on the path's length and the number of taken branches. We provide an optimization algorithm that constructively builds a WCET-optimal implementation of a given decision tree with respect to this surrogate model. We experimentally evaluate both the surrogate model and the WCET-optimization algorithm. The evaluation shows that the optimization algorithm improves analytically determined WCET by up to $17\%$ compared to an unoptimized implementation.

artificial intelligence, machine learning, surrogate model, (18 more...)

2501.17428

Country:

Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Ebihara, Akinori F., Miyagawa, Taiki, Sakurai, Kazuyuki, Imaoka, Hitoshi

Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test

Time-sensitive machine learning benefits from Sequential Probability Ratio Test (SPRT), which provides an optimal stopping time for early classification of time series. However, in finite horizon scenarios, where input lengths are finite, determining the optimal stopping rule becomes computationally intensive due to the need for backward induction, limiting practical applicability. We thus introduce FIRMBOUND, an SPRT-based framework that efficiently estimates the solution to backward induction from training data, bridging the gap between optimal stopping theory and real-world deployment. It employs density ratio estimation and convex function learning to provide statistically consistent estimators for sufficient statistic and conditional expectation, both essential for solving backward induction; consequently, FIRMBOUND minimizes Bayes risk to reach optimality. Additionally, we present a faster alternative using Gaussian process regression, which significantly reduces training time while retaining low deployment overhead, albeit with potential compromise in statistical consistency. Experiments across independent and identically distributed (i.i.d.), non-i.i.d., binary, multiclass, synthetic, and real-world datasets show that FIRMBOUND achieves optimalities in the sense of Bayes risk and speed-accuracy tradeoff. Furthermore, it advances the tradeoff boundary toward optimality when possible and reduces decision-time variance, ensuring reliable decision-making. Code is publicly available at https://github.com/Akinori-F-Ebihara/FIRMBOUND

artificial intelligence, machine learning, reinforcement learning, (21 more...)

2501.18059

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)