AITopics

2410.22258

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gillespie, Patrick, Maroulas, Vasileios, Schizas, Ioannis

Bayesian Sheaf Neural Networks

arXiv.org Artificial IntelligenceOct-12-2024

Equipping graph neural networks with a convolution operation defined in terms of a cellular sheaf offers advantages for learning expressive representations of heterophilic graph data. The most flexible approach to constructing the sheaf is to learn it as part of the network as a function of the node features. However, this leaves the network potentially overly sensitive to the learned sheaf. As a counter-measure, we propose a variational approach to learning cellular sheaves within sheaf neural networks, yielding an architecture we refer to as a Bayesian sheaf neural network. As part of this work, we define a novel family of reparameterizable probability distributions on the rotation group $SO(n)$ using the Cayley transform. We evaluate the Bayesian sheaf neural network on several graph datasets, and show that our Bayesian sheaf models outperform deterministic sheaf models when training data is limited, and are less sensitive to the choice of hyperparameters.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2410.0959

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Campos, Cédric M., de Diego, David Martín, Torrente, José

Momentum-based gradient descent methods for Lie groups

arXiv.org Artificial IntelligenceApr-14-2024

Classical Momentum, and Nesterov's Accelerated Gradient (NAG; Nesterov, 1983) are well know examples of momentum-descent methods for optimization. While the latter outperforms the former, solely generalizations of PHB-like methods to nonlinear spaces have been described in the literature. We propose here a generalization of NAG-like methods for Lie group optimization based on the variational one-to-one correspondence between classical and accelerated momentum methods (Campos et al., 2023).

equation, lie group, momentum-based gradient descent method, (12 more...)

2404.09363

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.42)

Melikechi, Omar, Dunson, David B.

Ellipsoid fitting with the Cayley transform

arXiv.org Machine LearningSep-27-2023

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

artificial intelligence, ellipsoid, machine learning, (17 more...)

2304.1063

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > Indiana (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Verhoek, Chris, Wang, Ruigang, Tóth, Roland

Learning Stable and Robust Linear Parameter-Varying State-Space Models

arXiv.org Artificial IntelligenceSep-26-2023

This paper presents two direct parameterizations of stable and robust linear parameter-varying state-space (LPV-SS) models. The model parametrizations guarantee a priori that for all parameter values during training, the allowed models are stable in the contraction sense or have their Lipschitz constant bounded by a user-defined value $\gamma$. Furthermore, since the parametrizations are direct, the models can be trained using unconstrained optimization. The fact that the trained models are of the LPV-SS class makes them useful for, e.g., further convex analysis or controller design. The effectiveness of the approach is demonstrated on an LPV identification problem.

identification, lpv-ss model, parametrization, (16 more...)

doi: 10.1109/CDC49753.2023.10384260

2304.01828

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Hungary (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceMay-16-2023

Ortho-ODE: Enhancing Robustness and of Neural ODEs against Adversarial Attacks

Purohit, Vishal

Neural Ordinary Differential Equations (NODEs) probed the usage of numerical solvers to solve the differential equation characterized by a Neural Network (NN), therefore initiating a new paradigm of deep learning models with infinite depth. NODEs were designed to tackle the irregular time series problem. However, NODEs have demonstrated robustness against various noises and adversarial attacks. This paper is about the natural robustness of NODEs and examines the cause behind such surprising behaviour. We show that by controlling the Lipschitz constant of the ODE dynamics the robustness can be significantly improved. We derive our approach from Grownwall's inequality. Further, we draw parallels between contractivity theory and Grownwall's inequality. Experimentally we corroborate the enhanced robustness on numerous datasets - MNIST, CIFAR-10, and CIFAR 100. We also present the impact of adaptive and non-adaptive solvers on the robustness of NODEs.

artificial intelligence, machine learning, robustness, (12 more...)

2305.09179

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.66)
Government > Military (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pauli, Patricia, Wang, Ruigang, Manchester, Ian R., Allgöwer, Frank

Lipschitz-bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian

arXiv.org Artificial IntelligenceMar-20-2023

We establish a layer-wise parameterization for 1D convolutional neural networks (CNNs) with built-in end-to-end robustness guarantees. Herein, we use the Lipschitz constant of the input-output mapping characterized by a CNN as a robustness measure. We base our parameterization on the Cayley transform that parameterizes orthogonal matrices and the controllability Gramian for the state space representation of the convolutional layers. The proposed parameterization by design fulfills linear matrix inequalities that are sufficient for Lipschitz continuity of the CNN, which further enables unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and show their improved robustness.

artificial intelligence, machine learning, parameterization, (15 more...)

2303.11835

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-12-2022

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

Mucllari, Edison, Zadorozhnyy, Vasily, Pospisil, Cole, Nguyen, Duc, Ye, Qiang

In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.

experiment, hochreiter and schmidhuber, nc-gru, (12 more...)

2208.06496

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Kentucky > Fayette County > Lexington (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Trockman, Asher, Kolter, J. Zico

Orthogonalizing Convolutional Layers with the Cayley Transform

arXiv.org Machine LearningApr-14-2021

Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heuristic in nature, often focusing on penalty methods or limited classes of convolutions. In this work, we propose and evaluate an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal. Specifically, we propose to apply the Cayley transform to a skew-symmetric convolution in the Fourier domain, so that the inverse convolution needed by the Cayley transform can be computed efficiently. We compare our method to previous Lipschitz-constrained and orthogonal convolutional layers and show that it indeed preserves orthogonality to a high degree even for large convolutions. Applied to the problem of certified adversarial robustness, we show that networks incorporating the layer outperform existing deterministic methods for certified defense against $\ell_2$-norm-bounded adversaries, while scaling to larger architectures than previously investigated. Code is available at https://github.com/locuslab/orthogonal-convolutions.

cayley transform, convolution, matrix, (13 more...)

2104.07167

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningFeb-3-2020

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

Li, Jun, Fuxin, Li, Todorovic, Sinisa

Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.

cayley sgd, cayley transform, stiefel manifold, (16 more...)

2002.01113

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)