Goto

Collaborating Authors

 Perceptrons


Propagate & Distill: Towards Effective Graph Learners Using Propagation-Embracing MLPs

arXiv.org Artificial Intelligence

Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semisupervised node classification on graphs, by training a student MLP by knowledge distillation from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during distillation, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the distillation process as making the student MLP learn both $T$ and $\Pi$. Although this can be achieved by applying the inverse propagation $\Pi^{-1}$ before distillation from the teacher, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher before distillation, which can be interpreted as an approximate process of the inverse propagation. We demonstrate that P&D can readily improve the performance of the student MLP.


Binary perceptrons capacity via fully lifted random duality theory

arXiv.org Machine Learning

We study the statistical capacity of the classical binary perceptrons with general thresholds $\kappa$. After recognizing the connection between the capacity and the bilinearly indexed (bli) random processes, we utilize a recent progress in studying such processes to characterize the capacity. In particular, we rely on \emph{fully lifted} random duality theory (fl RDT) established in \cite{Stojnicflrdt23} to create a general framework for studying the perceptrons' capacities. Successful underlying numerical evaluations are required for the framework (and ultimately the entire fl RDT machinery) to become fully practically operational. We present results obtained in that directions and uncover that the capacity characterizations are achieved on the second (first non-trivial) level of \emph{stationarized} full lifting. The obtained results \emph{exactly} match the replica symmetry breaking predictions obtained through statistical physics replica methods in \cite{KraMez89}. Most notably, for the famous zero-threshold scenario, $\kappa=0$, we uncover the well known $\alpha\approx0.8330786$ scaled capacity.


Reduced-order modeling for parameterized PDEs via implicit neural representations

arXiv.org Artificial Intelligence

We present a new data-driven reduced-order modeling approach to efficiently solve parametrized partial differential equations (PDEs) for many-query problems. This work is inspired by the concept of implicit neural representation (INR), which models physics signals in a continuous manner and independent of spatial/temporal discretization. The proposed framework encodes PDE and utilizes a parametrized neural ODE (PNODE) to learn latent dynamics characterized by multiple PDE parameters. PNODE can be inferred by a hypernetwork to reduce the potential difficulties in learning PNODE due to a complex multilayer perceptron (MLP). The framework uses an INR to decode the latent dynamics and reconstruct accurate PDE solutions. Further, a physics-informed loss is also introduced to correct the prediction of unseen parameter instances. Incorporating the physics-informed loss also enables the model to be fine-tuned in an unsupervised manner on unseen PDE parameters. A numerical experiment is performed on a two-dimensional Burgers equation with a large variation of PDE parameters. We evaluate the proposed method at a large Reynolds number and obtain up to speedup of O(10^3) and ~1% relative error to the ground truth values.


Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers

arXiv.org Artificial Intelligence

Quantum computing has the potential to solve problems that are intractable for classical systems, yet the high error rates in contemporary quantum devices often exceed tolerable limits for useful algorithm execution. Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their states for errors. The syndromes are subsequently interpreted by a decoding algorithm to identify and correct errors in the data qubits. This task is complex due to the multiplicity of error sources affecting both data and syndrome qubits as well as syndrome extraction operations. Additionally, identical syndromes can emanate from different error sources, necessitating a decoding algorithm that evaluates syndromes collectively. Although machine learning (ML) decoders such as multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) have been proposed, they often focus on local syndrome regions and require retraining when adjusting for different code distances. We introduce a transformer-based QEC decoder which employs self-attention to achieve a global receptive field across all input syndromes. It incorporates a mixed loss training approach, combining both local physical error and global parity label losses. Moreover, the transformer architecture's inherent adaptability to variable-length inputs allows for efficient transfer learning, enabling the decoder to adapt to varying code distances without retraining. Evaluation on six code distances and ten different error configurations demonstrates that our model consistently outperforms non-ML decoders, such as Union Find (UF) and Minimum Weight Perfect Matching (MWPM), and other ML decoders, thereby achieving best logical error rates. Moreover, the transfer learning can save over 10x of training cost.


GGNNs : Generalizing GNNs using Residual Connections and Weighted Message Passing

arXiv.org Artificial Intelligence

Many real-world phenomena can be modeled as a graph, making them extremely valuable due to their ubiquitous presence. GNNs excel at capturing those relationships and patterns within these graphs, enabling effective learning and prediction tasks. GNNs are constructed using Multi-Layer Perceptrons (MLPs) and incorporate additional layers for message passing to facilitate the flow of features among nodes. It is commonly believed that the generalizing power of GNNs is attributed to the message-passing mechanism between layers, where nodes exchange information with their neighbors, enabling them to effectively capture and propagate information across the nodes of a graph. Our technique builds on these results, modifying the message-passing mechanism further: one by weighing the messages before accumulating at each node and another by adding Residual connections. These two mechanisms show significant improvements in learning and faster convergence


ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report]

arXiv.org Artificial Intelligence

We propose ProtoArgNet, a novel interpretable deep neural architecture for image classification in the spirit of prototypical-part-learning as found, e.g. in ProtoPNet. While earlier approaches associate every class with multiple prototypical-parts, ProtoArgNet uses super-prototypes that combine prototypical-parts into single prototypical class representations. Furthermore, while earlier approaches use interpretable classification layers, e.g. logistic regression in ProtoPNet, ProtoArgNet improves accuracy with multi-layer perceptrons while relying upon an interpretable reading thereof based on a form of argumentation. ProtoArgNet is customisable to user cognitive requirements by a process of sparsification of the multi-layer perceptron/argumentation component. Also, as opposed to other prototypical-part-learning approaches, ProtoArgNet can recognise spatial relations between different prototypical-parts that are from different regions in images, similar to how CNNs capture relations between patterns recognized in earlier layers.


JetLOV: Enhancing Jet Tree Tagging through Neural Network Learning of Optimal LundNet Variables

arXiv.org Artificial Intelligence

Machine learning has played a pivotal role in advancing physics, with deep learning notably contributing to solving complex classification problems such as jet tagging in the field of jet physics. In this experiment, we aim to harness the full potential of neural networks while acknowledging that, at times, we may lose sight of the underlying physics governing these models. Nevertheless, we demonstrate that we can achieve remarkable results obscuring physics knowledge and relying completely on the model's outcome. We introduce JetLOV, a composite comprising two models: a straightforward multilayer perceptron (MLP) and the well-established LundNet. Our study reveals that we can attain comparable jet tagging performance without relying on the pre-computed LundNet variables. Instead, we allow the network to autonomously learn an entirely new set of variables, devoid of a priori knowledge of the underlying physics. These findings hold promise, particularly in addressing the issue of model dependence, which can be mitigated through generalization and training on diverse data sets.


FRAD: Front-Running Attacks Detection on Ethereum using Ternary Classification Model

arXiv.org Artificial Intelligence

With the evolution of blockchain technology, the issue of transaction security, particularly on platforms like Ethereum, has become increasingly critical. Front-running attacks, a unique form of security threat, pose significant challenges to the integrity of blockchain transactions. In these attack scenarios, malicious actors monitor other users' transaction activities, then strategically submit their own transactions with higher fees. This ensures their transactions are executed before the monitored transactions are included in the block. The primary objective of this paper is to delve into a comprehensive classification of transactions associated with front-running attacks, which aims to equip developers with specific strategies to counter each type of attack. To achieve this, we introduce a novel detection method named FRAD (Front-Running Attacks Detection on Ethereum using Ternary Classification Model). This method is specifically tailored for transactions within decentralized applications (DApps) on Ethereum, enabling accurate classification of front-running attacks involving transaction displacement, insertion, and suppression. Our experimental validation reveals that the Multilayer Perceptron (MLP) classifier offers the best performance in detecting front-running attacks, achieving an impressive accuracy rate of 84.59% and F1-score of 84.60%.


Unveiling The Factors of Aesthetic Preferences with Explainable AI

arXiv.org Artificial Intelligence

The allure of aesthetic appeal in images captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing machine learning models that focus on aesthetic attributes known to influence preferences. Through a data mining approach, our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP). Our methodology involves employing various machine learning models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, to compare their performances in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, providing insights into the roles of attributes and their interactions. Ultimately, our study aims to shed light on the complex nature of aesthetic preferences in images through machine learning and provides a deeper understanding of the attributes that influence aesthetic judgements.


Towards Machine Learning-based Quantitative Hyperspectral Image Guidance for Brain Tumor Resection

arXiv.org Artificial Intelligence

Complete resection of malignant gliomas is hampered by the difficulty in distinguishing tumor cells at the infiltration zone. Fluorescence guidance with 5-ALA assists in reaching this goal. Using hyperspectral imaging, previous work characterized five fluorophores' emission spectra in most human brain tumors. In this paper, the effectiveness of these five spectra was explored for different tumor and tissue classification tasks in 184 patients (891 hyperspectral measurements) harboring low- (n=30) and high-grade gliomas (n=115), non-glial primary brain tumors (n=19), radiation necrosis (n=2), miscellaneous (n=10) and metastases (n=8). Four machine learning models were trained to classify tumor type, grade, glioma margins and IDH mutation. Using random forests and multi-layer perceptrons, the classifiers achieved average test accuracies of 84-87%, 96.1%, 86%, and 93% respectively. All five fluorophore abundances varied between tumor margin types and tumor grades (p < 0.01). For tissue type, at least four of the five fluorophore abundances were found to be significantly different (p < 0.01) between all classes. These results demonstrate the fluorophores' differing abundances in different tissue classes, as well as the value of the five fluorophores as potential optical biomarkers, opening new opportunities for intraoperative classification systems in fluorescence-guided neurosurgery.