AITopics | continuation method

Collaborating Authors

continuation method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Principled Curriculum Learning using Parameter Continuation Methods

Pathak, Harsh Nilesh, Paffenroth, Randy

arXiv.org Artificial IntelligenceJul-31-2025

In this work, we propose a parameter continuation method for the optimization of neural networks. There is a close connection between parameter continuation, homotopies, and curriculum learning. The methods we propose here are theoretically justified and practically effective for several problems in deep neural networks. In particular, we demonstrate better generalization performance than state-of-the-art optimization techniques such as ADAM for supervised and unsupervised learning tasks.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2507.22089

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers

Pathak, Harsh Nilesh, Paffenroth, Randy

arXiv.org Artificial IntelligenceJul-22-2025

Parameter efficient fine tuning (PEFT) is a versatile and extensible approach for adapting a Large Language Model (LLM) for newer tasks. One of the most prominent PEFT approaches, Low Rank Adaptation (LoRA), primarily focuses on adjusting the attention weight matrices within individual decoder blocks of a Generative Pre trained Transformer (GPT2). In contrast, we introduce Solo Connection a novel method that adapts the representation at the decoder-block level rather than modifying individual weight matrices. Not only does Solo Connection outperform LoRA on E2E natural language generation benchmarks, but it also reduces the number of trainable parameters by 59% relative to LoRA and by more than 99% compared to full fine-tuning of GPT2, an early version of Large Language Models (LLMs). Solo Connection is also motivated by homotopy theory: we introduce a trainable linear transformation that gradually interpolates between a zero vector and the task-specific representation, enabling smooth and stable adaptation over time. While skip connections in the original 12 layer GPT2 are typically confined to individual decoder blocks, subsequent GPT2 variants scale up to 48 layers, and even larger language models can include 128 or more decoder blocks. These expanded architectures underscore the need to revisit how skip connections are employed during fine-tuning. This paper focuses on long skip connections that link outputs of different decoder blocks, potentially enhancing the model's ability to adapt to new tasks while leveraging pre-trained knowledge.

large language model, machine learning, solo connection, (20 more...)

arXiv.org Artificial Intelligence

2507.14353

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stable and Convexified Information Bottleneck Optimization via Symbolic Continuation and Entropy-Regularized Trajectories

Alpay, Faruk

arXiv.org Artificial IntelligenceMay-15-2025

The Information Bottleneck (IB) method frequently suffers from unstable optimization, characterized by abrupt representation shifts near critical points of the IB trade-off parameter, beta. In this paper, I introduce a novel approach to achieve stable and convex IB optimization through symbolic continuation and entropy-regularized trajectories. I analytically prove convexity and uniqueness of the IB solution path when an entropy regularization term is included, and demonstrate how this stabilizes representation learning across a wide range of \b{eta} values. Additionally, I provide extensive sensitivity analyses around critical points (beta) with statistically robust uncertainty quantification (95% confidence intervals). The open-source implementation, experimental results, and reproducibility framework included in this work offer a clear path for practical deployment and future extension of my proposed method.

artificial intelligence, machine learning, transition, (18 more...)

arXiv.org Artificial Intelligence

2505.09239

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Semialgebraic Neural Networks: From roots to representations

Mis, S. David, Lassas, Matti, de Hoop, Maarten V.

arXiv.org Artificial IntelligenceJan-2-2025

Many numerical algorithms in scientific computing -- particularly in areas like numerical linear algebra, PDE simulation, and inverse problems -- produce outputs that can be represented by semialgebraic functions; that is, the graph of the computed function can be described by finitely many polynomial equalities and inequalities. In this work, we introduce Semialgebraic Neural Networks (SANNs), a neural network architecture capable of representing any bounded semialgebraic function, and computing such functions up to the accuracy of a numerical ODE solver chosen by the programmer. Conceptually, we encode the graph of the learned function as the kernel of a piecewise polynomial selected from a class of functions whose roots can be evaluated using a particular homotopy continuation method. We show by construction that the SANN architecture is able to execute this continuation method, thus evaluating the learned semialgebraic function. Furthermore, the architecture can exactly represent even discontinuous semialgebraic functions by executing a continuation method on each connected component of the target function. Lastly, we provide example applications of these networks and show they can be trained with traditional deep-learning techniques.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.01564

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(9 more...)

Genre: Research Report (0.81)

Industry:

Health & Medicine (0.68)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Error-free Training for Artificial Neural Network

Deng, Bo

arXiv.org Artificial IntelligenceDec-26-2023

Abstract: Conventional training methods for artificial neural network (ANN) models never achieve zero error rate systematically for large data. A new training method consists of three steps: first create an auxiliary data from conventionally trained parameters which correspond exactly to a global minimum for the loss function of the cloned data; second create a one-parameter homotopy (hybrid) of the auxiliary data and the original data; and third train the model for the hybrid data iteratively from the auxiliary data end of the homotopy parameter to the original data end while maintaining the zero-error training rate at every iteration. This continuation method is guaranteed to converge numerically by a theorem which converts the ANN training problem into a continuation problem for fixed points of a parameterized transformation in the training parameter space to which the Uniform Contraction Mapping Theorem from dynamical systems applies. By definition, to train an ANN model is to find the global minimum of its loss function with the 100% accuracy rate. The theoretical solution to this problem was established by [3, 7] in 1989 for finite points classification with sufficiently many parameters.

ann model, global minimum, loss function, (16 more...)

arXiv.org Artificial Intelligence

2312.1606

Country:

North America > United States > Nebraska > Lancaster County > Lincoln (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A multiobjective continuation method to compute the regularization path of deep neural networks

Amakor, Augustina C., Sonntag, Konstantin, Peitz, Sebastian

arXiv.org Machine LearningSep-19-2023

Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.

artificial intelligence, machine learning, regularization path, (18 more...)

arXiv.org Machine Learning

2308.12044

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

An Approach for Generating Families of Energetically Optimal Gaits from Passive Dynamic Walking Gaits

Rosa, Nelson, Katamish, Bassel, Raff, Maximilian, Remy, C. David

arXiv.org Artificial IntelligenceMar-26-2023

For a class of biped robots with impulsive dynamics and a non-empty set of passive gaits (unactuated, periodic motions of the biped model), we present a method for computing continuous families of locally optimal gaits with respect to a class of commonly used energetic cost functions (e.g., the integral of torque-squared). We compute these families using only the passive gaits of the biped, which are globally optimal gaits with respect to these cost functions. Our approach fills in an important gap in the literature when computing a library of locally optimal gaits, which often do not make use of these globally optimal solutions as seed values. We demonstrate our approach on a well-studied two-link biped model.

artificial intelligence, gait, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2303.1475

Country:

Europe > Germany (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.46)

Add feedback

Research Papers based on Constrained Optimization Problems part2(Artificial Intelligence)

#artificialintelligenceMay-27-2022, 00:10:02 GMT

Abstract: This work proposes an accelerated primal-dual dynamical system for affine constrained convex optimization and presents a class of primal-dual methods with nonergodic convergence rates. In continuous level, exponential decay of a novel Lyapunov function is established and in discrete level, implicit, semi-implicit and explicit numerical discretizations for the continuous model are considered sequentially and lead to new accelerated primal-dual methods for solving linearly constrained optimization problems. Special structures of the subproblems in those schemes are utilized to develop efficient inner solvers. In addition, nonergodic convergence rates in terms of primal-dual gap, primal objective residual and feasibility violation are proved via a tailored discrete Lyapunov function. Moreover, our method has also been applied to decentralized distributed optimization for fast and efficient solution.

constrained optimization problem part2, continuation method, optimization problem, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

An End-to-End Differentiable Framework for Contact-Aware Robot Design

Xu, Jie, Chen, Tao, Zlokapa, Lara, Foshey, Michael, Matusik, Wojciech, Sueda, Shinjiro, Agrawal, Pulkit

arXiv.org Artificial IntelligenceJul-15-2021

The current dominant paradigm for robotic manipulation involves two separate stages: manipulator design and control. Because the robot's morphology and how it can be controlled are intimately linked, joint optimization of design and control can significantly improve performance. Existing methods for co-optimization are limited and fail to explore a rich space of designs. The primary reason is the trade-off between the complexity of designs that is necessary for contact-rich tasks against the practical constraints of manufacturing, optimization, contact handling, etc. We overcome several of these challenges by building an end-to-end differentiable framework for contact-aware robot design. The two key components of this framework are: a novel deformation-based parameterization that allows for the design of articulated rigid robots with arbitrary, complex geometry, and a differentiable rigid body simulator that can handle contact-rich scenarios and computes analytical gradients for a full spectrum of kinematic and dynamic parameters. On multiple manipulation tasks, our framework outperforms existing methods that either only optimize for control or for design using alternate representations or co-optimize using gradient-free methods.

morphology, optimization, parameterization, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.15607/RSS.2021.XVII.008

2107.07501

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Texas (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)

Add feedback

A Theoretical Analysis of Optimization by Gaussian Continuation

Mobahi, Hossein (Massachusetts Institute of Technology) | III, John W. Fisher (Massachusetts Institute of Technology)

AAAI ConferencesMar-6-2015

Optimization via continuation method is a widely used approach for solving nonconvex minimization problems. While this method generally does not provide a global minimum, empirically it often achieves a superior local minimum compared to alternative approaches such as gradient descent. However, theoretical analysis of this method is largely unavailable. Here, we provide a theoretical analysis that provides a bound on the endpoint solution of the continuation method. The derived bound depends on a problem specific characteristic that we refer to as optimization complexity. We show that this characteristic can be analytically computed when the objective function is expressed in some suitable basis functions. Our analysis combines elements of scale-space theory, regularization and differential equations.

Add feedback