Goto

Collaborating Authors

 Perceptrons


What Makes for Hierarchical Vision Transformer?

arXiv.org Artificial Intelligence

Recent studies show that hierarchical Vision Transformer with interleaved non-overlapped intra window self-attention \& shifted window self-attention is able to achieve state-of-the-art performance in various visual recognition tasks and challenges CNN's dense sliding window paradigm. Most follow-up works try to replace shifted window operation with other kinds of cross window communication while treating self-attention as the de-facto standard for intra window information aggregation. In this short preprint, we question whether self-attention is the only choice for hierarchical Vision Transformer to attain strong performance, and what makes for hierarchical Vision Transformer? We replace self-attention layers in Swin Transformer and Shuffle Transformer with simple linear mapping and keep other components unchanged. The resulting architecture with 25.4M parameters and 4.2G FLOPs achieves 80.5\% Top-1 accuracy, compared to 81.3\% for Swin Transformer with 28.3M parameters and 4.5G FLOPs. We also experiment with other alternatives to self-attention for context aggregation inside each non-overlapped window, which all give similar competitive results under the same architecture. Our study reveals that the \textbf{macro architecture} of Swin model families (i.e., interleaved intra window \& cross window communications), other than specific aggregation layers or specific means of cross window communication, may be more responsible for its strong performance and is the real challenger to CNN's dense sliding window paradigm.


A System for Traded Control Teleoperation of Manipulation Tasks using Intent Prediction from Hand Gestures

arXiv.org Artificial Intelligence

This paper presents a teleoperation system that includes robot perception and intent prediction from hand gestures. The perception module identifies the objects present in the robot workspace and the intent prediction module which object the user likely wants to grasp. This architecture allows the approach to rely on traded control instead of direct control: we use hand gestures to specify the goal objects for a sequential manipulation task, the robot then autonomously generates a grasping or a retrieving motion using trajectory optimization. The perception module relies on the model-based tracker to precisely track the 6D pose of the objects and makes use of a state of the art learning-based object detection and segmentation method, to initialize the tracker by automatically detecting objects in the scene. Goal objects are identified from user hand gestures using a trained a multi-layer perceptron classifier. After presenting all the components of the system and their empirical evaluation, we present experimental results comparing our pipeline to a direct traded control approach (i.e., one that does not use prediction) which shows that using intent prediction allows to bring down the overall task execution time.


Global Filter Networks for Image Classification

arXiv.org Artificial Intelligence

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases. These models are generally based on learning interaction among spatial locations from raw data. The complexity of self-attention and MLP grows quadratically as the image size increases, which makes these models hard to scale up when high-resolution features are required. In this paper, we present the Global Filter Network (GFNet), a conceptually simple yet computationally efficient architecture, that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform. We exhibit favorable accuracy/complexity trade-offs of our models on both ImageNet and downstream tasks. Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness. Code is available at https://github.com/raoyongming/GFNet


$\alpha$-Stable convergence of heavy-tailed infinitely-wide neural networks

arXiv.org Machine Learning

We consider infinitely-wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with i.i.d. samples from either a light-tailed (finite variance) or heavy-tailed distribution in the domain of attraction of a symmetric $\alpha$-stable distribution, where $\alpha\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $\alpha$-stable distribution having the same $\alpha$ parameter of that layer. We then extend a recent result of Favaro, Fortini, and Peluchetti (2020), to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $\alpha$-stable distributions.


Day 58 - What is Perceptron Algorithm? - IT Consultant - SAP, Artificial Intelligence and Machine Learning

#artificialintelligence

If you want to take a particular decision on a given task, for example, if a bank wants to approve a credit card for a person, they check a certain number of criteria to approve or deny. This classification task can be done through Perceptron Algorithm; however, today they are many hybrid algorithms that work behind for approving the credit card or any binary classification task. Perceptron is a very simple supervised learning algorithm used for a binary classification task. The binary can give only two options 0 or 1, yes or no, approve or deny, etc. The weights are w1, w2, w3โ€ฆ.wdxd, and features are x1, x2, x3โ€ฆxd and then add bias term b that determines 0 or 1, approve or deny, yes or no, etc.


Feed Forward Neural Network

#artificialintelligence

A Feed Forward Neural Network is commonly seen in its simplest form as a single layer perceptron. In this model, a series of inputs enter the layer and are multiplied by the weights. Each value is then added together to get a sum of the weighted input values. If the sum of the values is above a specific threshold, usually set at zero, the value produced is often 1, whereas if the sum falls below the threshold, the output value is -1. The single layer perceptron is an important model of feed forward neural networks and is often used in classification tasks. Furthermore, single layer perceptrons can incorporate aspects of machine learning.


Distilling Self-Knowledge From Contrastive Links to Classify Graph Nodes Without Passing Messages

arXiv.org Artificial Intelligence

Nowadays, Graph Neural Networks (GNNs) following the Message Passing paradigm become the dominant way to learn on graphic data. Models in this paradigm have to spend extra space to look up adjacent nodes with adjacency matrices and extra time to aggregate multiple messages from adjacent nodes. To address this issue, we develop a method called LinkDist that distils self-knowledge from connected node pairs into a Multi-Layer Perceptron (MLP) without the need to aggregate messages. Experiment with 8 real-world datasets shows the MLP derived from LinkDist can predict the label of a node without knowing its adjacencies but achieve comparable accuracy against GNNs in the contexts of semi- and full-supervised node classification. Moreover, LinkDist benefits from its Non-Message Passing paradigm that we can also distil self-knowledge from arbitrarily sampled node pairs in a contrastive way to further boost the performance of LinkDist.


Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

arXiv.org Artificial Intelligence

Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative to the commonly used Multi-Layer Perceptron (MLP) classification head. GBDTs have desirable properties such as good performance on dense, numerical features and are effective where the ratio of the number of samples w.r.t the number of features is low. We then introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model (RoBERTa-large with MNLI pretraining). The FreeGBDT shows a consistent improvement over the MLP classification head.


Graph-MLP: Node Classification without Message Passing in Graph

arXiv.org Artificial Intelligence

Graph Neural Network (GNN) has been demonstrated its effectiveness in dealing with non-Euclidean structural data. Both spatial-based and spectral-based GNNs are relying on adjacency matrix to guide message passing among neighbors during feature aggregation. Recent works have mainly focused on powerful message passing modules, however, in this paper, we show that none of the message passing modules is necessary. Instead, we propose a pure multilayer-perceptron-based framework, Graph-MLP with the supervision signal leveraging graph structure, which is sufficient for learning discriminative node representation. In model-level, Graph-MLP only includes multi-layer perceptrons, activation function, and layer normalization. In the loss level, we design a neighboring contrastive (NContrast) loss to bridge the gap between GNNs and MLPs by utilizing the adjacency information implicitly. This design allows our model to be lighter and more robust when facing large-scale graph data and corrupted adjacency information. Extensive experiments prove that even without adjacency information in testing phase, our framework can still reach comparable and even superior performance against the state-of-the-art models in the graph node classification task.


DMIDAS: Deep Mixed Data Sampling Regression for Long Multi-Horizon Time Series Forecasting

arXiv.org Machine Learning

Neural forecasting has shown significant improvements in the accuracy of large-scale systems, yet predicting extremely long horizons remains a challenging task. Two common problems are the volatility of the predictions and their computational complexity; we addressed them by incorporating smoothness regularization and mixed data sampling techniques to a well-performing multi-layer perceptron based architecture (NBEATS). We validate our proposed method, DMIDAS, on high-frequency healthcare and electricity price data with long forecasting horizons (~1000 timestamps) where we improve the prediction accuracy by 5% over state-of-the-art models, reducing the number of parameters of NBEATS by nearly 70%.