Goto

Collaborating Authors

 nullspace






On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning

arXiv.org Artificial Intelligence

Continual learning seeks the human-like ability to accumulate new skills in machine intelligence. Its central challenge is catastrophic forgetting, whose underlying cause has not been fully understood for deep networks. In this paper, we demystify catastrophic forgetting by revealing that the new-task training is implicitly an adversarial attack against the old-task knowledge. Specifically, the new-task gradients automatically and accurately align with the sharp directions of the old-task loss landscape, rapidly increasing the old-task loss. This adversarial alignment is intriguingly counter-intuitive because the sharp directions are too sparsely distributed to align with by chance. To understand it, we theoretically show that it arises from training's low-rank bias, which, through forward and backward propagation, confines the two directions into the same low-dimensional subspace, facilitating alignment. Gradient projection (GP) methods, a representative family of forgetting-mitigating methods, reduce adversarial alignment caused by forward propagation, but cannot address the alignment due to backward propagation. We propose backGP to address it, which reduces forgetting by 10.8% and improves accuracy by 12.7% on average over GP methods.


Beyond Anthropomorphism: Enhancing Grasping and Eliminating a Degree of Freedom by Fusing the Abduction of Digits Four and Five

arXiv.org Artificial Intelligence

Abstract-- This paper presents the SABD hand, a 16-degree-of-freedom (DoF) robotic hand that departs from purely anthropomorphic designs to achieve an expanded grasp envelope, enable manipulation poses beyond human capability, and reduce the required number of actuators. This is achieved by combining the adduction/abduction (Add/Abd) joint of digits four and five into a single joint with a large range of motion. The combined joint increases the workspace of the digits by 400% and reduces the required DoFs while retaining dexterity. Experimental results demonstrate that the combined Add/Abd joint enables the hand to grasp objects with a side distance of up to 200 mm. Reinforcement learning-based investigations show that the design enables grasping policies that are effective not only for handling larger objects but also for achieving enhanced grasp stability. In teleoperated trials, the hand successfully performed 86% of attempted grasps on suitable YCB objects, including challenging non-anthropomorphic configurations. These findings validate the design's ability to enhance grasp stability, flexibility, and dexterous manipulation without added complexity, making it well-suited for a wide range of applications. A. Motivation Robust grasping for robotic manipulation is one of the key issues preventing the usage of robots in many applications [1]. The difficulty herein can be attributed to both software [2] and hardware challenges [3]. No robotic manipulator has been able to fully match the dexterity, power-to-weight ratio, and exteroception of the human hand [4]. Commercially available solutions, such as robotic grippers [5], the Shadow Robotic Hand [6], the Allegro Hand [7] and the Leap Hand [8], tend to be expensive or overly limited in their capabilities.


A Study on Variants of Conventional, Fuzzy, and Nullspace-Based Independence Criteria for Improving Supervised and Unsupervised Learning

arXiv.org Machine Learning

-- Unsupervised and supervised learning methods conventionally use kernels to capture nonlinearities inherent in data structure. However experts have to ensure their proposed nonlinearity maximizes variability and capture inherent diversity of data. We revie wed all independenc e criteria to design unsupervised learners. Then we proposed 3 independence criteria and used them to design unsupervised and supervised dimensionality reduction methods. We evaluated contrast, accuracy and interpretability of these meth ods in both linear and neural nonlinear settings. The results show that the methods have outperformed the baseline (tSNE, PCA, regularized LDA, VAE with (un)supervised learner and layer sharing) and opened a new line of interpretable machine learning (ML) for the researchers. Small amount of research is conducted on the role and nature of statistical independence for Machine Learning (ML). Independency criteria are mainly used in the context of Independent Component Analysis (ICA). However learning more about capability of them, gives a wide variety of tools for processing and interpreting supervised and unsupervised learning. As uncorrelatedness is a specific type of independence (linear independence), most of PCA - based approaches gets summariz ed into a special case of independenc y . Another insight about independenc e is the mechanism of Linear Discriminant Analysis (LDA) [15], Independent Component Analysis ( ICA) [1], and Variational Autoencoder ( VAE) [13] based on independency criteria. LDA seeks for a linear projection with least between - class and highest within - class linear dependence. ICA seeks for an unmixing matrix with least statistical dependency between projected components. Finally, VAE seeks for a nonlinear projection to mixtures with minimum correlation (linear independency), minimum mean, and agreed variance. Yet, despite proposing many variations of Kernel PCA [ 1, 19 ] (least between sample dependency criterion), there is no publication in liter ature with neural version of PCA and LDA.


An Efficient Numerical Function Optimization Framework for Constrained Nonlinear Robotic Problems

arXiv.org Artificial Intelligence

This paper presents a numerical function optimization framework designed for constrained optimization problems in robotics. The tool is designed with real-time considerations and is suitable for online trajectory and control input optimization problems. The proposed framework does not require any analytical representation of the problem and works with constrained block-box optimization functions. The method combines first-order gradient-based line search algorithms with constraint prioritization through nullspace projections onto constraint Jacobian space. The tool is implemented in C++ and provided online for community use, along with some numerical and robotic example implementations presented in this paper.


Hessian-Informed Flow Matching

arXiv.org Artificial Intelligence

Modeling complex systems that evolve toward equilibrium distributions is important in various physical applications, including molecular dynamics and robotic control. These systems often follow the stochastic gradient descent of an underlying energy function, converging to stationary distributions around energy minima. The local covariance of these distributions is shaped by the energy landscape's curvature, often resulting in anisotropic characteristics. While flow-based generative models have gained traction in generating samples from equilibrium distributions in such applications, they predominately employ isotropic conditional probability paths, limiting their ability to capture such covariance structures. In this paper, we introduce Hessian-Informed Flow Matching (HI-FM), a novel approach that integrates the Hessian of an energy function into conditional flows within the flow matching framework. This integration allows HI-FM to account for local curvature and anisotropic covariance structures. Our approach leverages the linearization theorem from dynamical systems and incorporates additional considerations such as time transformations and equivariance. Empirical evaluations on the MNIST and Lennard-Jones particles datasets demonstrate that HI-FM improves the likelihood of test samples.


Linearly constrained Gaussian processes

Neural Information Processing Systems

We consider a modification of the covariance function in Gaussian processes to correctly account for known linear operator constraints. By modeling the target function as a transformation of an underlying function, the constraints are explicitly incorporated in the model such that they are guaranteed to be fulfilled by any sample drawn or prediction made. We also propose a constructive procedure for designing the transformation operator and illustrate the result on both simulated and real-data examples.