Goto

Collaborating Authors

 continuation method


Principled Curriculum Learning using Parameter Continuation Methods

arXiv.org Artificial Intelligence

In this work, we propose a parameter continuation method for the optimization of neural networks. There is a close connection between parameter continuation, homotopies, and curriculum learning. The methods we propose here are theoretically justified and practically effective for several problems in deep neural networks. In particular, we demonstrate better generalization performance than state-of-the-art optimization techniques such as ADAM for supervised and unsupervised learning tasks.


Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers

arXiv.org Artificial Intelligence

Parameter efficient fine tuning (PEFT) is a versatile and extensible approach for adapting a Large Language Model (LLM) for newer tasks. One of the most prominent PEFT approaches, Low Rank Adaptation (LoRA), primarily focuses on adjusting the attention weight matrices within individual decoder blocks of a Generative Pre trained Transformer (GPT2). In contrast, we introduce Solo Connection a novel method that adapts the representation at the decoder-block level rather than modifying individual weight matrices. Not only does Solo Connection outperform LoRA on E2E natural language generation benchmarks, but it also reduces the number of trainable parameters by 59% relative to LoRA and by more than 99% compared to full fine-tuning of GPT2, an early version of Large Language Models (LLMs). Solo Connection is also motivated by homotopy theory: we introduce a trainable linear transformation that gradually interpolates between a zero vector and the task-specific representation, enabling smooth and stable adaptation over time. While skip connections in the original 12 layer GPT2 are typically confined to individual decoder blocks, subsequent GPT2 variants scale up to 48 layers, and even larger language models can include 128 or more decoder blocks. These expanded architectures underscore the need to revisit how skip connections are employed during fine-tuning. This paper focuses on long skip connections that link outputs of different decoder blocks, potentially enhancing the model's ability to adapt to new tasks while leveraging pre-trained knowledge.


Stable and Convexified Information Bottleneck Optimization via Symbolic Continuation and Entropy-Regularized Trajectories

arXiv.org Artificial Intelligence

The Information Bottleneck (IB) method frequently suffers from unstable optimization, characterized by abrupt representation shifts near critical points of the IB trade-off parameter, beta. In this paper, I introduce a novel approach to achieve stable and convex IB optimization through symbolic continuation and entropy-regularized trajectories. I analytically prove convexity and uniqueness of the IB solution path when an entropy regularization term is included, and demonstrate how this stabilizes representation learning across a wide range of \b{eta} values. Additionally, I provide extensive sensitivity analyses around critical points (beta) with statistically robust uncertainty quantification (95% confidence intervals). The open-source implementation, experimental results, and reproducibility framework included in this work offer a clear path for practical deployment and future extension of my proposed method.


Semialgebraic Neural Networks: From roots to representations

arXiv.org Artificial Intelligence

Many numerical algorithms in scientific computing -- particularly in areas like numerical linear algebra, PDE simulation, and inverse problems -- produce outputs that can be represented by semialgebraic functions; that is, the graph of the computed function can be described by finitely many polynomial equalities and inequalities. In this work, we introduce Semialgebraic Neural Networks (SANNs), a neural network architecture capable of representing any bounded semialgebraic function, and computing such functions up to the accuracy of a numerical ODE solver chosen by the programmer. Conceptually, we encode the graph of the learned function as the kernel of a piecewise polynomial selected from a class of functions whose roots can be evaluated using a particular homotopy continuation method. We show by construction that the SANN architecture is able to execute this continuation method, thus evaluating the learned semialgebraic function. Furthermore, the architecture can exactly represent even discontinuous semialgebraic functions by executing a continuation method on each connected component of the target function. Lastly, we provide example applications of these networks and show they can be trained with traditional deep-learning techniques.


Error-free Training for Artificial Neural Network

arXiv.org Artificial Intelligence

Abstract: Conventional training methods for artificial neural network (ANN) models never achieve zero error rate systematically for large data. A new training method consists of three steps: first create an auxiliary data from conventionally trained parameters which correspond exactly to a global minimum for the loss function of the cloned data; second create a one-parameter homotopy (hybrid) of the auxiliary data and the original data; and third train the model for the hybrid data iteratively from the auxiliary data end of the homotopy parameter to the original data end while maintaining the zero-error training rate at every iteration. This continuation method is guaranteed to converge numerically by a theorem which converts the ANN training problem into a continuation problem for fixed points of a parameterized transformation in the training parameter space to which the Uniform Contraction Mapping Theorem from dynamical systems applies. By definition, to train an ANN model is to find the global minimum of its loss function with the 100% accuracy rate. The theoretical solution to this problem was established by [3, 7] in 1989 for finite points classification with sufficiently many parameters.


A multiobjective continuation method to compute the regularization path of deep neural networks

arXiv.org Machine Learning

Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.


An Approach for Generating Families of Energetically Optimal Gaits from Passive Dynamic Walking Gaits

arXiv.org Artificial Intelligence

For a class of biped robots with impulsive dynamics and a non-empty set of passive gaits (unactuated, periodic motions of the biped model), we present a method for computing continuous families of locally optimal gaits with respect to a class of commonly used energetic cost functions (e.g., the integral of torque-squared). We compute these families using only the passive gaits of the biped, which are globally optimal gaits with respect to these cost functions. Our approach fills in an important gap in the literature when computing a library of locally optimal gaits, which often do not make use of these globally optimal solutions as seed values. We demonstrate our approach on a well-studied two-link biped model.


Research Papers based on Constrained Optimization Problems part2(Artificial Intelligence)

#artificialintelligence

Abstract: This work proposes an accelerated primal-dual dynamical system for affine constrained convex optimization and presents a class of primal-dual methods with nonergodic convergence rates. In continuous level, exponential decay of a novel Lyapunov function is established and in discrete level, implicit, semi-implicit and explicit numerical discretizations for the continuous model are considered sequentially and lead to new accelerated primal-dual methods for solving linearly constrained optimization problems. Special structures of the subproblems in those schemes are utilized to develop efficient inner solvers. In addition, nonergodic convergence rates in terms of primal-dual gap, primal objective residual and feasibility violation are proved via a tailored discrete Lyapunov function. Moreover, our method has also been applied to decentralized distributed optimization for fast and efficient solution.


An End-to-End Differentiable Framework for Contact-Aware Robot Design

arXiv.org Artificial Intelligence

The current dominant paradigm for robotic manipulation involves two separate stages: manipulator design and control. Because the robot's morphology and how it can be controlled are intimately linked, joint optimization of design and control can significantly improve performance. Existing methods for co-optimization are limited and fail to explore a rich space of designs. The primary reason is the trade-off between the complexity of designs that is necessary for contact-rich tasks against the practical constraints of manufacturing, optimization, contact handling, etc. We overcome several of these challenges by building an end-to-end differentiable framework for contact-aware robot design. The two key components of this framework are: a novel deformation-based parameterization that allows for the design of articulated rigid robots with arbitrary, complex geometry, and a differentiable rigid body simulator that can handle contact-rich scenarios and computes analytical gradients for a full spectrum of kinematic and dynamic parameters. On multiple manipulation tasks, our framework outperforms existing methods that either only optimize for control or for design using alternate representations or co-optimize using gradient-free methods.


A Theoretical Analysis of Optimization by Gaussian Continuation

AAAI Conferences

Optimization via continuation method is a widely used approach for solving nonconvex minimization problems. While this method generally does not provide a global minimum, empirically it often achieves a superior local minimum compared to alternative approaches such as gradient descent. However, theoretical analysis of this method is largely unavailable. Here, we provide a theoretical analysis that provides a bound on the endpoint solution of the continuation method. The derived bound depends on a problem specific characteristic that we refer to as optimization complexity. We show that this characteristic can be analytically computed when the objective function is expressed in some suitable basis functions. Our analysis combines elements of scale-space theory, regularization and differential equations.