Perceptrons
Perceptron Collaborative Filtering
While multivariate logistic regression classifiers are a great way of implementing collaborative filtering - a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many other users, we can also achieve similar results using neural networks. A recommender system is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. A perceptron or a neural network is a machine learning model designed for fitting complex datasets using backpropagation and gradient descent. When coupled with advanced optimization techniques, the model may prove to be a great substitute for classical logistic classifiers. The optimizations include feature scaling, mean normalization, regularization, hyperparameter tuning and using stochastic/mini-batch gradient descent instead of regular gradient descent. In this use case, we will use the perceptron in the recommender system to fit the parameters i.e., the data from a multitude of users and use it to predict the preference/interest of a particular user.
Jacobian-Enhanced Neural Networks
Jacobian-Enhanced Neural Networks (JENN) are densely connected multi-layer perceptrons, whose training process is modified to predict partial derivatives accurately. Their main benefit is better accuracy with fewer training points compared to standard neural networks. These attributes are particularly desirable in the field of computer-aided design, where there is often the need to replace computationally expensive, physics-based models with fast running approximations, known as surrogate models or meta-models. Since a surrogate emulates the original model accurately in near-real time, it yields a speed benefit that can be used to carry out orders of magnitude more function calls quickly. However, in the special case of gradient-enhanced methods, there is the additional value proposition that partial derivatives are accurate, which is a critical property for one important use-case: surrogate-based optimization. This work derives the complete theory and exemplifies its superiority over standard neural nets for surrogate-based optimization.
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark
Qian, Xiaowei, Guo, Zhimeng, Li, Jialiang, Mao, Haitao, Li, Bingheng, Wang, Suhang, Ma, Yao
Fair graph learning plays a pivotal role in numerous practical applications. Recently, many fair graph learning methods have been proposed; however, their evaluation often relies on poorly constructed semi-synthetic datasets or substandard real-world datasets. In such cases, even a basic Multilayer Perceptron (MLP) can outperform Graph Neural Networks (GNNs) in both utility and fairness. In this work, we illustrate that many datasets fail to provide meaningful information in the edges, which may challenge the necessity of using graph structures in these problems. To address these issues, we develop and introduce a collection of synthetic, semi-synthetic, and real-world datasets that fulfill a broad spectrum of requirements. These datasets are thoughtfully designed to include relevant graph structures and bias information crucial for the fair evaluation of models. The proposed synthetic and semi-synthetic datasets offer the flexibility to create data with controllable bias parameters, thereby enabling the generation of desired datasets with user-defined bias values with ease. Moreover, we conduct systematic evaluations of these proposed datasets and establish a unified evaluation approach for fair graph learning models. Our extensive experimental results with fair graph learning methods across our datasets demonstrate their effectiveness in benchmarking the performance of these methods. Our datasets and the code for reproducing our experiments are available at https://github.com/XweiQ/Benchmark-GraphFairness.
Graph Knowledge Distillation to Mixture of Experts
Rumiantsev, Pavel, Coates, Mark
In terms of accuracy, Graph Neural Networks (GNNs) are the best architectural choice for the node classification task. Their drawback in real-world deployment is the latency that emerges from the neighbourhood processing operation. One solution to the latency issue is to perform knowledge distillation from a trained GNN to a Multi-Layer Perceptron (MLP), where the MLP processes only the features of the node being classified (and possibly some pre-computed structural information). However, the performance of such MLPs in both transductive and inductive settings remains inconsistent for existing knowledge distillation techniques. We propose to address the performance concerns by using a specially-designed student model instead of an MLP. Our model, named Routing-by-Memory (RbM), is a form of Mixture-of-Experts (MoE), with a design that enforces expert specialization. By encouraging each expert to specialize on a certain region on the hidden representation space, we demonstrate experimentally that it is possible to derive considerably more consistent performance across multiple datasets.
Grad-Instructor: Universal Backpropagation with Explainable Evaluation Neural Networks for Meta-learning and AutoML
This paper presents a novel method for autonomously enhancing deep neural network training. My approach employs an Evaluation Neural Network (ENN) trained via deep reinforcement learning to predict the performance of the target network. The ENN then works as an additional evaluation function during backpropagation. Computational experiments with Multi-Layer Perceptrons (MLPs) demonstrate the method's effectiveness. By processing input data at 0.15^2 times its original resolution, the ENNs facilitated efficient inference. Results indicate that MLPs trained with the proposed method achieved a mean test accuracy of 93.02%, which is 2.8% higher than those trained solely with conventional backpropagation or with L1 regularization. The proposed method's test accuracy is comparable to networks initialized with He initialization while reducing the difference between test and training errors. These improvements are achieved without increasing the number of epochs, thus avoiding the risk of overfitting. Additionally, the proposed method dynamically adjusts gradient magnitudes according to the training stage. The optimal ENN for enhancing MLPs can be predicted, reducing the time spent exploring optimal training methodologies. The explainability of ENNs is also analyzed using Grad-CAM, demonstrating their ability to visualize evaluation bases and supporting the Strong Lottery Ticket hypothesis.
Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks
Wang, Shuai, Zhang, David W., Huang, Jia-Hong, Rudinac, Stevan, Kackovic, Monika, Wijnberg, Nachoem, Worring, Marcel
Hypergraphs serve as an effective model for depicting complex connections in various real-world scenarios, from social to biological networks. The development of Hypergraph Neural Networks (HGNNs) has emerged as a valuable method to manage the intricate associations in data, though scalability is a notable challenge due to memory limitations. In this study, we introduce a new adaptive sampling strategy specifically designed for hypergraphs, which tackles their unique complexities in an efficient manner. We also present a Random Hyperedge Augmentation (RHA) technique and an additional Multilayer Perceptron (MLP) module to improve the robustness and generalization capabilities of our approach. Thorough experiments with real-world datasets have proven the effectiveness of our method, markedly reducing computational and memory demands while maintaining performance levels akin to conventional HGNNs and other baseline models. This research paves the way for improving both the scalability and efficacy of HGNNs in extensive applications. We will also make our codebase publicly accessible.
Deep Sketched Output Kernel Regression for Structured Prediction
Ahmad, Tamim El, Yang, Junjie, Laforgue, Pierre, d'Alchรฉ-Buc, Florence
By leveraging the kernel trick in the output space, kernel-induced losses provide a principled way to define structured output prediction tasks for a wide variety of output modalities. In particular, they have been successfully used in the context of surrogate non-parametric regression, where the kernel trick is typically exploited in the input space as well. However, when inputs are images or texts, more expressive models such as deep neural networks seem more suited than non-parametric methods. In this work, we tackle the question of how to train neural networks to solve structured output prediction tasks, while still benefiting from the versatility and relevance of kernel-induced losses. We design a novel family of deep neural architectures, whose last layer predicts in a data-dependent finite-dimensional subspace of the infinite-dimensional output feature space deriving from the kernel-induced loss. This subspace is chosen as the span of the eigenfunctions of a randomly-approximated version of the empirical kernel covariance operator. Interestingly, this approach unlocks the use of gradient descent algorithms (and consequently of any neural architecture) for structured prediction. Experiments on synthetic tasks as well as real-world supervised graph prediction problems show the relevance of our method.
Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing
Sivagnanam, Amutheezan, Pettet, Ava, Lee, Hunter, Mukhopadhyay, Ayan, Dubey, Abhishek, Laszka, Aron
An emergency responder management (ERM) system dispatches responders, such as ambulances, when it receives requests for medical aid. ERM systems can also proactively reposition responders between predesignated waiting locations to cover any gaps that arise due to the prior dispatch of responders or significant changes in the distribution of anticipated requests. Optimal repositioning is computationally challenging due to the exponential number of ways to allocate responders between locations and the uncertainty in future requests. The state-of-the-art approach in proactive repositioning is a hierarchical approach based on spatial decomposition and online Monte Carlo tree search, which may require minutes of computation for each decision in a domain where seconds can save lives. We address the issue of long decision times by introducing a novel reinforcement learning (RL) approach, based on the same hierarchical decomposition, but replacing online search with learning. To address the computational challenges posed by large, variable-dimensional, and discrete state and action spaces, we propose: (1) actor-critic based agents that incorporate transformers to handle variable-dimensional states and actions, (2) projections to fixed-dimensional observations to handle complex states, and (3) combinatorial techniques to map continuous actions to discrete allocations. We evaluate our approach using real-world data from two U.S. cities, Nashville, TN and Seattle, WA. Our experiments show that compared to the state of the art, our approach reduces computation time per decision by three orders of magnitude, while also slightly reducing average ambulance response time by 5 seconds.
Local vs. Global Interpretability: A Computational Complexity Perspective
Bassan, Shahaf, Amir, Guy, Katz, Guy
The local and global interpretability of various ML models has been studied extensively in recent years. However, despite significant progress in the field, many known results remain informal or lack sufficient mathematical rigor. We propose a framework for bridging this gap, by using computational complexity theory to assess local and global perspectives of interpreting ML models. We begin by proposing proofs for two novel insights that are essential for our analysis: (1) a duality between local and global forms of explanations; and (2) the inherent uniqueness of certain global explanation forms. We then use these insights to evaluate the complexity of computing explanations, across three model types representing the extremes of the interpretability spectrum: (1) linear models; (2) decision trees; and (3) neural networks. Our findings offer insights into both the local and global interpretability of these models. For instance, under standard complexity assumptions such as P != NP, we prove that selecting global sufficient subsets in linear models is computationally harder than selecting local subsets. Interestingly, with neural networks and decision trees, the opposite is true: it is harder to carry out this task locally than globally. We believe that our findings demonstrate how examining explainability through a computational complexity lens can help us develop a more rigorous grasp of the inherent interpretability of ML models.
TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification
Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient and scalable sequence modeling. We also introduce a novel tango scanning scheme to better model sequence relationships. Experiments on 10 standard benchmark datasets demonstrate our approach achieves an average 6.45% accuracy improvement over state-of-the-art TSC models.