Goto

Collaborating Authors

 negentropy


Thermodynamic Bound on Energy and Negentropy Costs of Inference in Deep Neural Networks

arXiv.org Artificial Intelligence

The fundamental thermodynamic bound is derived for the energy cost of inference in Deep Neural Networks (DNNs). By applying Landauer's principle, we demonstrate that the linear operations in DNNs can, in principle, be performed reversibly, whereas the non-linear activation functions impose an unavoidable energy cost. The resulting theoretical lower bound on the inference energy is determined by the average number of neurons undergoing state transition for each inference. We also restate the thermodynamic bound in terms of negentropy, a metric which is more universal than energy for assessing thermodynamic cost of information processing. Concept of negentropy is further elaborated in the context of information processing in biological and engineered system as well as human intelligence. Our analysis provides insight into the physical limits of DNN efficiency and suggests potential directions for developing energy-efficient AI architectures that leverage reversible analog computing.


Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

arXiv.org Artificial Intelligence

Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like $\ell_2$-normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.


Unified Binary and Multiclass Margin-Based Classification

arXiv.org Machine Learning

The notion of margin loss has been central to the development and analysis of algorithms for binary classification. To date, however, there remains no consensus as to the analogue of the margin loss for multiclass classification. In this work, we show that a broad range of multiclass loss functions, including many popular ones, can be expressed in the relative margin form, a generalization of the margin form of binary losses. The relative margin form is broadly useful for understanding and analyzing multiclass losses as shown by our prior work (Wang and Scott, 2020, 2021). To further demonstrate the utility of this way of expressing multiclass losses, we use it to extend the seminal result of Bartlett et al. (2006) on classification-calibration of binary margin losses to multiclass. We then analyze the class of Fenchel-Young losses, and expand the set of these losses that are known to be classification-calibrated.


Projection pursuit based on Gaussian mixtures and evolutionary algorithms

arXiv.org Machine Learning

We propose a projection pursuit (PP) algorithm based on Gaussian mixture models (GMMs). The negentropy obtained from a multivariate density estimated by GMMs is adopted as the PP index to be maximised. For a fixed dimension of the projection subspace, the GMM-based density estimation is projected onto that subspace, where an approximation of the negentropy for Gaussian mixtures is computed. Then, Genetic Algorithms (GAs) are used to find the optimal, orthogonal projection basis by maximising the former approximation. We show that this semi-parametric approach to PP is flexible and allows highly informative structures to be detected, by projecting multivariate datasets onto a subspace, where the data can be feasibly visualised. The performance of the proposed approach is shown on both artificial and real datasets.


On the Estimation of Entropy in the FastICA Algorithm

arXiv.org Machine Learning

The fastICA algorithm is a popular dimension reduction technique used to reveal patterns in data. Here we show that the approximations used in fastICA can result in patterns not being successfully recognised. We demonstrate this problem using a two-dimensional example where a clear structure is immediately visible to the naked eye, but where the projection chosen by fastICA fails to reveal this structure. This implies that care is needed when applying fastICA. We discuss how the problem arises and how it is intrinsically connected to the approximations that form the basis of the computational efficiency of fastICA.


Iterative Gaussianization: from ICA to Random Rotations

arXiv.org Machine Learning

Most signal processing problems involve the challenging task of multidimensional probability density function (PDF) estimation. In this work, we propose a solution to this problem by using a family of Rotation-based Iterative Gaussianization (RBIG) transforms. The general framework consists of the sequential application of a univariate marginal Gaussianization transform followed by an orthonormal transform. The proposed procedure looks for differentiable transforms to a known PDF so that the unknown PDF can be estimated at any point of the original domain. In particular, we aim at a zero mean unit covariance Gaussian for convenience. RBIG is formally similar to classical iterative Projection Pursuit (PP) algorithms. However, we show that, unlike in PP methods, the particular class of rotations used has no special qualitative relevance in this context, since looking for interestingness is not a critical issue for PDF estimation. The key difference is that our approach focuses on the univariate part (marginal Gaussianization) of the problem rather than on the multivariate part (rotation). This difference implies that one may select the most convenient rotation suited to each practical application. The differentiability, invertibility and convergence of RBIG are theoretically and experimentally analyzed. Relation to other methods, such as Radial Gaussianization (RG), one-class support vector domain description (SVDD), and deep neural networks (DNN) is also pointed out. The practical performance of RBIG is successfully illustrated in a number of multidimensional problems such as image synthesis, classification, denoising, and multi-information estimation.


Document Clustering with K-tree

arXiv.org Artificial Intelligence

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.


Independent Components Analysis through Product Density Estimation

Neural Information Processing Systems

We present a simple direct approach for solving the ICA problem, using density estimation and maximum likelihood. Given a candidate orthogonalframe, we model each of the coordinates using a semi-parametric density estimate based on cubic splines. Since our estimates have two continuous derivatives, we can easily run a second ordersearch for the frame parameters. Our method performs very favorably when compared to state-of-the-art techniques. 1 Introduction Independent component analysis (ICA) is a popular enhancement over principal component analysis (PCA) and factor analysis. IRP which is assumed to arise from a linear mixing of a latent random source vector S E IRP, (1) X AS; the components Sj, j 1, ...,p of S are assumed to be independently distributed.