Goto

Collaborating Authors

 Perceptrons


Relational Attention: Generalizing Transformers for Graph-Structured Tasks

arXiv.org Artificial Intelligence

Transformers flexibly operate over sets of real-valued vectors representing taskspecific entities and their attributes, where each vector might encode one wordpiece token and its position in a sequence, or some piece of information that carries no position at all. But as set processors, standard transformers are at a disadvantage in reasoning over more general graph-structured data where nodes represent entities and edges represent relations between entities. To address this shortcoming, we generalize transformer attention to consider and update edge vectors in each transformer layer. We evaluate this relational transformer on a diverse array of graph-structured tasks, including the large and challenging CLRS Algorithmic Reasoning Benchmark. There, it dramatically outperforms state-of-theart graph neural networks expressly designed to reason over graph-structured data. Our analysis demonstrates that these gains are attributable to relational attention's inherent ability to leverage the greater expressivity of graphs over sets. Graph-structured problems turn up in many domains, including knowledge bases (Hu et al., 2021; Bordes et al., 2013), communication networks (Leskovec et al., 2010), citation networks (McCallum et al., 2000), and molecules (Debnath et al., 1991; Zhang et al., 2020b). One example is predicting the bioactive properties of a molecule, where the atoms of the molecule are the nodes of the graph and the bonds are the edges. Along with their ubiquity, graph-structured problems vary widely in difficulty. For example, certain graph problems can be solved with a simple multi-layer perceptron, while others are quite challenging and require explicit modeling of relational characteristics. Graph Neural Networks (GNNs) are designed to process graphstructured data, including the graph's (possibly directed) edge Figure 1: The relational transformer structure and (in some cases) features associated with the edges. Standard transformers lack the relational inductive biases (Battaglia et al., 2018) that are explicitly built into the most commonly used GNNs. This allows entities carrying domain-specific attributes (like position) to be encoded as vectors for input to the same transformer architecture applied to different domains. Work was done during an internship at Microsoft Research. Figure 2: Categories of GNNs and Transformers, compared in terms of transformer machinery and edge vector incorporation.


Efficient Testable Learning of Halfspaces with Adversarial Label Noise

arXiv.org Artificial Intelligence

Learning halfspaces from random labeled examples is a classical task in machine learning, with history going back to the Perceptron algorithm [Ros58]. In the realizable PAC model [Val84] (i.e., with consistent labels), the class of halfspaces is known to be efficiently learnable without distributional assumptions. On the other hand, in the agnostic (or adversarial label noise) model [Hau92, KSS94] even weak learning is computationally intractable in the distribution-free setting [Dan16, DKMR22, Tie22]. These intractability results have served as a motivation for the study of agnostic learning in the distribution-specific setting, i.e., when the marginal distribution on examples is assumed to be wellbehaved. In this context, a number of algorithmic results are known.


RLPrompt: Optimizing discrete text prompts with reinforcement learning

AIHub

Figure 1: Overview of RL Prompt for discrete prompt optimization. All language models (LMs) are frozen. We build our policy network by training a task-specific multi-layer perceptron (MLP) network inserted into a frozen pre-trained LM. The figure above illustrates 1) generation of a prompt (left), 2) example usages in a masked LM for classification (top right) and a left-to-right LM for generation (bottom right), and 3) update of the MLP using RL reward signals (red arrows). TL;DR: Prompting enables large language models (LLMs) to perform various NLP tasks without changing the model.


Continuous Function Structured in Multilayer Perceptron for Global Optimization

arXiv.org Artificial Intelligence

The gradient information of multilayer perceptron with a linear neuron is modified with functional derivative for the global minimum search benchmarking problems. From this approach, we show that the landscape of the gradient derived from given continuous function using functional derivative can be the MLP-like form with ax+b neurons. In this extent, the suggested algorithm improves the availability of the optimization process to deal all the parameters in the problem set simultaneously. The functionality of this method could be improved through intentionally designed convex function with Kullack-Liebler divergence applied to cost value as well.


Amplitude-Varying Perturbation for Balancing Privacy and Utility in Federated Learning

arXiv.org Artificial Intelligence

While preserving the privacy of federated learning (FL), differential privacy (DP) inevitably degrades the utility (i.e., accuracy) of FL due to model perturbations caused by DP noise added to model updates. Existing studies have considered exclusively noise with persistent root-mean-square amplitude and overlooked an opportunity of adjusting the amplitudes to alleviate the adverse effects of the noise. This paper presents a new DP perturbation mechanism with a time-varying noise amplitude to protect the privacy of FL and retain the capability of adjusting the learning performance. Specifically, we propose a geometric series form for the noise amplitude and reveal analytically the dependence of the series on the number of global aggregations and the $(\epsilon,\delta)$-DP requirement. We derive an online refinement of the series to prevent FL from premature convergence resulting from excessive perturbation noise. Another important aspect is an upper bound developed for the loss function of a multi-layer perceptron (MLP) trained by FL running the new DP mechanism. Accordingly, the optimal number of global aggregations is obtained, balancing the learning and privacy. Extensive experiments are conducted using MLP, supporting vector machine, and convolutional neural network models on four public datasets. The contribution of the new DP mechanism to the convergence and accuracy of privacy-preserving FL is corroborated, compared to the state-of-the-art Gaussian noise mechanism with a persistent noise amplitude.


Gaussian Universality of Perceptrons with Random Labels

arXiv.org Artificial Intelligence

While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real datasets.


LSA-PINN: Linear Boundary Connectivity Loss for Solving PDEs on Complex Geometry

arXiv.org Artificial Intelligence

We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate sampling strategy at the near boundary region. Overly dense sampling can adversely impede training convergence if the local gradient behaviors are too complex to be adequately modelled by PINNs. On the other hand, if the samples are too sparse, existing PINNs tend to overfit the near boundary region, leading to incorrect solution. To prevent such issues, we propose a new Boundary Connectivity (BCXN) loss function which provides linear local structure approximation (LSA) to the gradient behaviors at the boundary for PINN. Our BCXN-loss implicitly imposes local structure during training, thus facilitating fast physics-informed learning across entire problem domains with order of magnitude sparser training samples. This LSA-PINN method shows a few orders of magnitude smaller errors than existing methods in terms of the standard L2-norm metric, while using dramatically fewer training samples and iterations. Our proposed LSA-PINN does not pose any requirement on the differentiable property of the networks, and we demonstrate its benefits and ease of implementation on both multi-layer perceptron and convolutional neural network versions as commonly used in current PINN literature.


A Lifted Bregman Formulation for the Inversion of Deep Neural Networks

arXiv.org Artificial Intelligence

We propose a novel framework for the regularised inversion of deep neural networks. The framework is based on the authors' recent work on training feed-forward neural networks without the differentiation of activation functions. The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables, and penalises these variables with tailored Bregman distances. We propose a family of variational regularisations based on these Bregman distances, present theoretical results and support their practical application with numerical examples. In particular, we present the first convergence result (to the best of our knowledge) for the regularised inversion of a single-layer perceptron that only assumes that the solution of the inverse problem is in the range of the regularisation operator, and that shows that the regularised inverse provably converges to the true inverse if measurement errors converge to zero.


Asymmetric Learning for Graph Neural Network based Link Prediction

arXiv.org Artificial Intelligence

Link prediction is a fundamental problem in many graph based applications, such as protein-protein interaction prediction. Graph neural network (GNN) has recently been widely used for link prediction. However, existing GNN based link prediction (GNN-LP) methods suffer from scalability problem during training for large-scale graphs, which has received little attention by researchers. In this paper, we first give computation complexity analysis of existing GNN-LP methods, which reveals that the scalability problem stems from their symmetric learning strategy adopting the same class of GNN models to learn representation for both head and tail nodes. Then we propose a novel method, called asymmetric learning (AML), for GNN-LP. The main idea of AML is to adopt a GNN model for learning head node representation while using a multi-layer perceptron (MLP) model for learning tail node representation. Furthermore, AML proposes a row-wise sampling strategy to generate mini-batch for training, which is a necessary component to make the asymmetric learning strategy work for training speedup. To the best of our knowledge, AML is the first GNN-LP method adopting an asymmetric learning strategy for node representation learning. Experiments on three real large-scale datasets show that AML is 1.7X~7.3X faster in training than baselines with a symmetric learning strategy, while having almost no accuracy loss.


Co-Design of Approximate Multilayer Perceptron for Ultra-Resource Constrained Printed Circuits

arXiv.org Artificial Intelligence

Printed Electronics (PE) exhibits on-demand, extremely low-cost hardware due to its additive manufacturing process, enabling machine learning (ML) applications for domains that feature ultra-low cost, conformity, and non-toxicity requirements that silicon-based systems cannot deliver. Nevertheless, large feature sizes in PE prohibit the realization of complex printed ML circuits. In this work, we present, for the first time, an automated printed-aware software/hardware co-design framework that exploits approximate computing principles to enable ultra-resource constrained printed multilayer perceptrons (MLPs). Our evaluation demonstrates that, compared to the state-of-the-art baseline, our circuits feature on average 6x (5.7x) lower area (power) and less than 1% accuracy loss.