Goto

Collaborating Authors

 Gradient Descent


GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent

arXiv.org Artificial Intelligence

Ensuring that the outputs of neural networks satisfy specific constraints is crucial for applying neural networks to real-life decision-making problems. In this paper, we consider making a batch of neural network outputs satisfy bounded and general linear constraints. We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem. We show that such a problem can be equivalently transformed into an unconstrained convex optimization problem with Lipschitz continuous gradient according to the duality theorem. Then, based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem. To the best of our knowledge, this is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free. Despite the fact that we can explicitly perform backpropagation based on automatic differentiation mechanism, we also provide an alternative approach in GLinSAT to calculate the derivatives based on implicit differentiation of the optimality condition. Experimental results on constrained traveling salesman problems, partial graph matching with outliers, predictive portfolio allocation and power system unit commitment demonstrate the advantages of GLinSAT over existing satisfiability layers. Our implementation is available at \url{https://github.com/HunterTracer/GLinSAT}.


Are Neuromorphic Architectures Inherently Privacy-preserving? An Exploratory Study

arXiv.org Artificial Intelligence

While machine learning (ML) models are becoming mainstream, especially in sensitive application areas, the risk of data leakage has become a growing concern. Attacks like membership inference (MIA) have shown that trained models can reveal sensitive data, jeopardizing confidentiality. While traditional Artificial Neural Networks (ANNs) dominate ML applications, neuromorphic architectures, specifically Spiking Neural Networks (SNNs), are emerging as promising alternatives due to their low power consumption and event-driven processing, akin to biological neurons. Privacy in ANNs is well-studied; however, little work has explored the privacy-preserving properties of SNNs. This paper examines whether SNNs inherently offer better privacy. Using MIAs, we assess the privacy resilience of SNNs versus ANNs across diverse datasets. We analyze the impact of learning algorithms (surrogate gradient and evolutionary), frameworks (snnTorch, TENNLab, LAVA), and parameters on SNN privacy. Our findings show that SNNs consistently outperform ANNs in privacy preservation, with evolutionary algorithms offering additional resilience. For instance, on CIFAR-10, SNNs achieve an AUC of 0.59, significantly lower than ANNs' 0.82, and on CIFAR-100, SNNs maintain an AUC of 0.58 compared to ANNs' 0.88. Additionally, we explore the privacy-utility trade-off with Differentially Private Stochastic Gradient Descent (DPSGD), finding that SNNs sustain less accuracy loss than ANNs under similar privacy constraints.


An Energy-Based Self-Adaptive Learning Rate for Stochastic Gradient Descent: Enhancing Unconstrained Optimization with VAV method

arXiv.org Machine Learning

Optimizing the learning rate remains a critical challenge in machine learning, essential for achieving model stability and efficient convergence. The Vector Auxiliary Variable (VAV) algorithm introduces a novel energy-based self-adjustable learning rate optimization method designed for unconstrained optimization problems. It incorporates an auxiliary variable $r$ to facilitate efficient energy approximation without backtracking while adhering to the unconditional energy dissipation law. Notably, VAV demonstrates superior stability with larger learning rates and achieves faster convergence in the early stage of the training process. Comparative analyses demonstrate that VAV outperforms Stochastic Gradient Descent (SGD) across various tasks. This paper also provides rigorous proof of the energy dissipation law and establishes the convergence of the algorithm under reasonable assumptions. Additionally, $r$ acts as an empirical lower bound of the training loss in practice, offering a novel scheduling approach that further enhances algorithm performance.


Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

arXiv.org Artificial Intelligence

Online multi-task learning (OMTL) enhances streaming data processing by leveraging the inherent relations among multiple tasks. It can be described as an optimization problem in which a single loss function is defined for multiple tasks. Existing gradient-descent-based methods for this problem might suffer from gradient vanishing and poor conditioning issues. Furthermore, the centralized setting hinders their application to online parallel optimization, which is vital to big data analytics. Therefore, this study proposes a novel OMTL framework based on the alternating direction multiplier method (ADMM), a recent breakthrough in optimization suitable for the distributed computing environment because of its decomposable and easy-to-implement nature. The relations among multiple tasks are modeled dynamically to fit the constant changes in an online scenario. In a classical distributed computing architecture with a central server, the proposed OMTL algorithm with the ADMM optimizer outperforms SGD-based approaches in terms of accuracy and efficiency. Because the central server might become a bottleneck when the data scale grows, we further tailor the algorithm to a decentralized setting, so that each node can work by only exchanging information with local neighbors. Experimental results on a synthetic and several real-world datasets demonstrate the efficiency of our methods.


A Comparative Analysis of Machine Learning Models for DDoS Detection in IoT Networks

arXiv.org Artificial Intelligence

This paper presents the detection of DDoS attacks in IoT networks using machine learning models. Their rapid growth has made them highly susceptible to various forms of cyberattacks, many of whose security procedures are implemented in an irregular manner. It evaluates the efficacy of different machine learning models, such as XGBoost, K-Nearest Neighbours, Stochastic Gradient Descent, and Na\"ive Bayes, in detecting DDoS attacks from normal network traffic. Each model has been explained on several performance metrics, such as accuracy, precision, recall, and F1-score to understand the suitability of each model in real-time detection and response against DDoS threats. This comparative analysis will, therefore, enumerate the unique strengths and weaknesses of each model with respect to the IoT environments that are dynamic and hence moving in nature. The effectiveness of these models is analyzed, showing how machine learning can greatly enhance IoT security frameworks, offering adaptive, efficient, and reliable DDoS detection capabilities. These findings have shown the potential of machine learning in addressing the pressing need for robust IoT security solutions that can mitigate modern cyber threats and assure network integrity.


Learning Mixtures of Experts with EM

arXiv.org Machine Learning

Mixtures of Experts (MoE) are Machine Learning models that involve partitioning the input space, with a separate "expert" model trained on each partition. Recently, MoE have become popular as components in today's large language models as a means to reduce training and inference costs. There, the partitioning function and the experts are both learnt jointly via gradient descent on the log-likelihood. In this paper we focus on studying the efficiency of the Expectation Maximization (EM) algorithm for the training of MoE models. We first rigorously analyze EM for the cases of linear or logistic experts, where we show that EM is equivalent to Mirror Descent with unit step size and a Kullback-Leibler Divergence regularizer. This perspective allows us to derive new convergence results and identify conditions for local linear convergence based on the signal-to-noise ratio (SNR). Experiments on synthetic and (small-scale) real-world data show that EM outperforms the gradient descent algorithm both in terms of convergence rate and the achieved accuracy.


Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

arXiv.org Artificial Intelligence

Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel optimization. However, a common limitation in these algorithms is the presumption of independent sampling, which can lead to increased computational costs due to the complicated hyper-gradient formulation of bilevel problems. To address this challenge, we study the example-selection strategy for bilevel optimization in this work. More specifically, we introduce a without-replacement sampling based algorithm which achieves a faster convergence rate compared to its counterparts that rely on independent sampling. Beyond the standard bilevel optimization formulation, we extend our discussion to conditional bilevel optimization and also two special cases: minimax and compositional optimization.


Differentiable Calibration of Inexact Stochastic Simulation Models via Kernel Score Minimization

arXiv.org Artificial Intelligence

Stochastic simulation models are generative models that mimic complex systems to help with decision-making. The reliability of these models heavily depends on well-calibrated input model parameters. However, in many practical scenarios, only output-level data are available to learn the input model parameters, which is challenging due to the often intractable likelihood of the stochastic simulation model. Moreover, stochastic simulation models are frequently inexact, with discrepancies between the model and the target system. No existing methods can effectively learn and quantify the uncertainties of input parameters using only output-level data. In this paper, we propose to learn differentiable input parameters of stochastic simulation models using output-level data via kernel score minimization with stochastic gradient descent. We quantify the uncertainties of the learned input parameters using a frequentist confidence set procedure based on a new asymptotic normality result that accounts for model inexactness. The proposed method is evaluated on exact and inexact G/G/1 queueing models.


SPGD: Steepest Perturbed Gradient Descent Optimization

arXiv.org Artificial Intelligence

Optimization algorithms are pivotal in advancing various scientific and industrial fields but often encounter obstacles such as trapping in local minima, saddle points, and plateaus (flat regions), which makes the convergence to reasonable or near-optimal solutions particularly challenging. This paper presents the Steepest Perturbed Gradient Descent (SPGD), a novel algorithm that innovatively combines the principles of the gradient descent method with periodic uniform perturbation sampling to effectively circumvent these impediments and lead to better solutions whenever possible. SPGD is distinctively designed to generate a set of candidate solutions and select the one exhibiting the steepest loss difference relative to the current solution. It enhances the traditional gradient descent approach by integrating a strategic exploration mechanism that significantly increases the likelihood of escaping sub-optimal local minima and navigating complex optimization landscapes effectively. Our approach not only retains the directed efficiency of gradient descent but also leverages the exploratory benefits of stochastic perturbations, thus enabling a more comprehensive search for global optima across diverse problem spaces. We demonstrate the efficacy of SPGD in solving the 3D component packing problem, an NP-hard challenge. Preliminary results show a substantial improvement over four established methods, particularly on response surfaces with complex topographies and in multidimensional non-convex continuous optimization problems. Comparative analyses with established 2D benchmark functions highlight SPGD's superior performance, showcasing its ability to navigate complex optimization landscapes. These results emphasize SPGD's potential as a versatile tool for a wide range of optimization problems.


Impact of Label Noise on Learning Complex Features

arXiv.org Artificial Intelligence

Neural networks trained with stochastic gradient descent exhibit an inductive bias towards simpler decision boundaries, typically converging to a narrow family of functions, and often fail to capture more complex features. This phenomenon raises concerns about the capacity of deep models to adequately learn and represent real-world datasets. Traditional approaches such as explicit regularization, data augmentation, architectural modifications, etc., have largely proven ineffective in encouraging the models to learn diverse features. In this work, we investigate the impact of pre-training models with noisy labels on the dynamics of SGD across various architectures and datasets. We show that pretraining promotes learning complex functions and diverse features in the presence of noise. Our experiments demonstrate that pre-training with noisy labels encourages gradient descent to find alternate minima that do not solely depend upon simple features, rather learns more complex and broader set of features, without hurting performance.