Goto

Collaborating Authors

 Perceptrons


On Herding and the Perceptron Cycling Theorem

Neural Information Processing Systems

The paper develops a connection between traditional perceptron algorithms and recently introduced herding algorithms. It is shown that both algorithms can be viewed as an application of the perceptron cycling theorem. This connection strengthens some herding results and suggests new (supervised) herding algorithms that, like CRFs or discriminative RBMs, make predictions by conditioning on the input attributes. We develop and investigate variants of conditional herding, and show that conditional herding leads to practical algorithms that perform better than or on par with related classifiers such as the voted perceptron and the discriminative RBM.


MatrixKAN: Parallelized Kolmogorov-Arnold Network

arXiv.org Artificial Intelligence

Kolmogorov-Arnold Networks (KAN) are a new class of neural network architecture representing a promising alternative to the Multilayer Perceptron (MLP), demonstrating improved expressiveness and interpretability. However, KANs suffer from slow training and inference speeds relative to MLPs due in part to the recursive nature of the underlying B-spline calculations. This issue is particularly apparent with respect to KANs utilizing high-degree B-splines, as the number of required non-parallelizable recursions is proportional to B-spline degree. We solve this issue by proposing MatrixKAN, a novel optimization that parallelizes B-spline calculations with matrix representation and operations, thus significantly improving effective computation time for models utilizing high-degree B-splines. In this paper, we demonstrate the superior scaling of MatrixKAN's computation time relative to B-spline degree. Further, our experiments demonstrate speedups of approximately 40x relative to KAN, with significant additional speedup potential for larger datasets or higher spline degrees.


Optimal Task Order for Continual Learning of Multiple Tasks

arXiv.org Machine Learning

Continual learning of multiple tasks remains a major challenge for neural networks. Here, we investigate how task order influences continual learning and propose a strategy for optimizing it. Leveraging a linear teacher-student model with latent factors, we derive an analytical expression relating task similarity and ordering to learning performance. Our analysis reveals two principles that hold under a wide parameter range: (1) tasks should be arranged from the least representative to the most typical, and (2) adjacent tasks should be dissimilar. We validate these rules on both synthetic data and real-world image classification datasets (Fashion-MNIST, CIFAR-10, CIFAR-100), demonstrating consistent performance improvements in both multilayer perceptrons and convolutional neural networks. Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.


Quantum Machine Learning: A Hands-on Tutorial for Machine Learning Practitioners and Researchers

arXiv.org Artificial Intelligence

This tutorial intends to introduce readers with a background in AI to quantum machine learning (QML) -- a rapidly evolving field that seeks to leverage the power of quantum computers to reshape the landscape of machine learning. For self-consistency, this tutorial covers foundational principles, representative QML algorithms, their potential applications, and critical aspects such as trainability, generalization, and computational complexity. In addition, practical code demonstrations are provided in https://qml-tutorial.github.io/ to illustrate real-world implementations and facilitate hands-on learning. Together, these elements offer readers a comprehensive overview of the latest advancements in QML. By bridging the gap between classical machine learning and quantum computing, this tutorial serves as a valuable resource for those looking to engage with QML and explore the forefront of AI in the quantum era.


AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

arXiv.org Artificial Intelligence

Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or noisy inputs, leading to misalignment between the modalities. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where scanned document images must be accurately mapped to their textual content. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods. We provide further analysis demonstrating improved vision-text feature alignment and robustness to noise.


Converting MLPs into Polynomials in Closed Form

arXiv.org Machine Learning

Recent work has shown that purely quadratic functions can replace MLPs in transformers with no significant loss in performance, while enabling new methods of interpretability based on linear algebra. In this work, we theoretically derive closed-form least-squares optimal approximations of feedforward networks (multilayer perceptrons and gated linear units) using polynomial functions of arbitrary degree. When the $R^2$ is high, this allows us to interpret MLPs and GLUs by visualizing the eigendecomposition of the coefficients of their linear and quadratic approximants. We also show that these approximants can be used to create SVD-based adversarial examples. By tracing the $R^2$ of linear and quadratic approximants across training time, we find new evidence that networks start out simple, and get progressively more complex. Even at the end of training, however, our quadratic approximants explain over 95% of the variance in network outputs.


Privacy Preserving Properties of Vision Classifiers

arXiv.org Artificial Intelligence

Vision classifiers are often trained on proprietary datasets containing sensitive information, yet the models themselves are frequently shared openly under the privacy-preserving assumption. Although these models are assumed to protect sensitive information in their training data, the extent to which this assumption holds for different architectures remains unexplored. This assumption is challenged by inversion attacks which attempt to reconstruct training data from model weights, exposing significant privacy vulnerabilities. In this study, we systematically evaluate the privacy-preserving properties of vision classifiers across diverse architectures, including Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), and Vision Transformers (ViTs). Using network inversion-based reconstruction techniques, we assess the extent to which these architectures memorize and reveal training data, quantifying the relative ease of reconstruction across models. Our analysis highlights how architectural differences, such as input representation, feature extraction mechanisms, and weight structures, influence privacy risks. By comparing these architectures, we identify which are more resilient to inversion attacks and examine the trade-offs between model performance and privacy preservation, contributing to the development of secure and privacy-respecting machine learning models for sensitive applications. Our findings provide actionable insights into the design of secure and privacy-aware machine learning systems, emphasizing the importance of evaluating architectural decisions in sensitive applications involving proprietary or personal data.


Contrastive Forward-Forward: A Training Algorithm of Vision Transformer

arXiv.org Artificial Intelligence

Although backpropagation is widely accepted as a training algorithm for artificial neural networks, researchers are always looking for inspiration from the brain to find ways with potentially better performance. Forward-Forward is a new training algorithm that is more similar to what occurs in the brain, although there is a significant performance gap compared to backpropagation. In the Forward-Forward algorithm, the loss functions are placed after each layer, and the updating of a layer is done using two local forward passes and one local backward pass. Forward-Forward is in its early stages and has been designed and evaluated on simple multi-layer perceptron networks to solve image classification tasks. In this work, we have extended the use of this algorithm to a more complex and modern network, namely the Vision Transformer. Inspired by insights from contrastive learning, we have attempted to revise this algorithm, leading to the introduction of Contrastive Forward-Forward. Experimental results show that our proposed algorithm performs significantly better than the baseline Forward-Forward leading to an increase of up to 10% in accuracy and boosting the convergence speed by 5 to 20 times on Vision Transformer. Furthermore, if we take Cross Entropy as the baseline loss function in backpropagation, it will be demonstrated that the proposed modifications to the baseline Forward-Forward reduce its performance gap compared to backpropagation on Vision Transformer, and even outperforms it in certain conditions, such as inaccurate supervision.


Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification

arXiv.org Artificial Intelligence

X-ray imaging is pivotal in medical diagnostics, offering non-invasive insights into a range of health conditions. Recently, vision-language models, such as the Contrastive Language-Image Pretraining (CLIP) model, have demonstrated potential in improving diagnostic accuracy by leveraging large-scale image-text datasets. However, since CLIP was not initially designed for medical images, several CLIP-like models trained specifically on medical images have been developed. Despite their enhanced performance, issues of fairness - particularly regarding demographic attributes - remain largely unaddressed. In this study, we perform a comprehensive fairness analysis of CLIP-like models applied to X-ray image classification. We assess their performance and fairness across diverse patient demographics and disease categories using zero-shot inference and various fine-tuning techniques, including Linear Probing, Multilayer Perceptron (MLP), Low-Rank Adaptation (LoRA), and full fine-tuning. Our results indicate that while fine-tuning improves model accuracy, fairness concerns persist, highlighting the need for further fairness interventions in these foundational models.


Tensor Completion for Surrogate Modeling of Material Property Prediction

arXiv.org Artificial Intelligence

When designing materials to optimize certain properties, there are often many possible configurations of designs that need to be explored. For example, the materials' composition of elements will affect properties such as strength or conductivity, which are necessary to know when developing new materials. Exploring all combinations of elements to find optimal materials becomes very time consuming, especially when there are more design variables. For this reason, there is growing interest in using machine learning (ML) to predict a material's properties. In this work, we model the optimization of certain material properties as a tensor completion problem, to leverage the structure of our datasets and navigate the vast number of combinations of material configurations. Across a variety of material property prediction tasks, our experiments show tensor completion methods achieving 10-20% decreased error compared with baseline ML models such as GradientBoosting and Multilayer Perceptron (MLP), while maintaining similar training speed.