AITopics | Perceptrons

Collaborating Authors

Perceptrons

News Overviews Instructional Materials AI-Alerts Classics

SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels

Moreno, Alexander, Mei, Jonathan, Walters, Luke

arXiv.org Artificial IntelligenceJul-9-2023

Toeplitz Neural Networks (TNNs) (Qin et. al. 2023) are a recent sequence model with impressive results. They require O(n log n) computational complexity and O(n) relative positional encoder (RPE) multi-layer perceptron (MLP) and decay bias calls. We aim to reduce both. We first note that the RPE is a non-SPD (symmetric positive definite) kernel and the Toeplitz matrices are pseudo-Gram matrices. Further 1) the learned kernels display spiky behavior near the main diagonals with otherwise smooth behavior; 2) the RPE MLP is slow. For bidirectional models, this motivates a sparse plus low-rank Toeplitz matrix decomposition. For the sparse component's action, we do a small 1D convolution. For the low rank component, we replace the RPE MLP with linear interpolation and use asymmetric Structured Kernel Interpolation (SKI) (Wilson et. al. 2015) for O(n) complexity: we provide rigorous error analysis. For causal models, "fast" causal masking (Katharopoulos et. al. 2020) negates SKI's benefits. Working in the frequency domain, we avoid an explicit decay bias. To enforce causality, we represent the kernel via the real part of its frequency response using the RPE and compute the imaginary part via a Hilbert transform. This maintains O(n log n) complexity but achieves an absolute speedup. Modeling the frequency response directly is also competitive for bidirectional training, using one fewer FFT. We set a speed state of the art on Long Range Arena (Tay et. al. 2020) with minimal score degradation.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2305.09028

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Multi-Scale U-Shape MLP for Hyperspectral Image Classification

Lin, Moule, Jing, Weipeng, Di, Donglin, Chen, Guangsheng, Song, Houbing

arXiv.org Artificial IntelligenceJul-5-2023

Hyperspectral images have significant applications in various domains, since they register numerous semantic and spatial information in the spectral band with spatial variability of spectral signatures. Two critical challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of the model. To tackle this challenge, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) a model consisting of the designed MSC (Multi-Scale Channel) block and the UMLP (U-shape Multi-Layer Perceptron) structure. MSC transforms the channel dimension and mixes spectral band feature to embed the deep-level representation adequately. UMLP is designed by the encoder-decoder structure with multi-layer perceptron layers, which is capable of compressing the large-scale parameters. Extensive experiments are conducted to demonstrate our model can outperform state-of-the-art methods across-the-board on three wide-adopted public datasets, namely Pavia University, Houston 2013 and Houston 2018

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LGRS.2022.3141547

2307.10186

Country:

North America > United States > Florida > Volusia County > Daytona Beach (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.35)

Industry: Energy (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression

Huynh, Ngo Nghi Truyen, Garambois, Pierre-André, Colleoni, François, Renard, Benjamin, Roux, Hélène

arXiv.org Artificial IntelligenceJul-4-2023

Regionalization (MPR) method, combining descriptors upscaling Regardless of the improvements made in hydrological and pre-regionalization function in form of multilinear forward models and available data, hydrological calibration regressions, implemented within a spatially distributed remains a challenging ill-posed inverse problem faced with multiscale hydrological model (mHm), has been proposed the equifinality (Beven, 2001) of feasible solutions. Most by Samaniego et al. (2010), and later applied to other gridded calibration approaches aim to estimate spatially uniform model hydrological models in several applicative studies (e.g., parameters for a single gauged catchment, resulting in piecewise Mizukami et al. (2017); Beck et al. (2020)). In all the constant discontinuous parameters fields for adjacent above studies, state of the art optimization algorithms are catchments. Moreover, these calibrated parameter are not used, especially Shuffle Complex Evolution algorithm (SCE) transferable to ungauged locations, which represents the majority (Duan et al., 1992) in Mizukami et al. (2017) or Distributed of the global land surface (Fekete & Vörösmarty, 2007; Evolutionary Algorithms (DEAP) (Fortin et al., 2012) in Beck Hannah et al., 2011).

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.02497

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.87)

Add feedback

Exploring Randomly Wired Neural Networks for Climate Model Emulation

Yik, William, Silva, Sam J., Geiss, Andrew, Watson-Parris, Duncan

arXiv.org Artificial IntelligenceJul-3-2023

Exploring the climate impacts of various anthropogenic emissions scenarios is key to making informed decisions for climate change mitigation and adaptation. State-of-the-art Earth system models can provide detailed insight into these impacts, but have a large associated computational cost on a per-scenario basis. This large computational burden has driven recent interest in developing cheap machine learning models for the task of climate model emulation. In this manuscript, we explore the efficacy of randomly wired neural networks for this task. We describe how they can be constructed and compare them to their standard feedforward counterparts using the ClimateBench dataset. Specifically, we replace the serially connected dense layers in multilayer perceptrons, convolutional neural networks, and convolutional long short-term memory networks with randomly wired dense layers and assess the impact on model performance for models with 1 million and 10 million parameters. We find that models with less complex architectures see the greatest performance improvement with the addition of random wiring (up to 30.4% for multilayer perceptrons). Furthermore, out of 24 different model architecture, parameter count, and prediction task combinations, only one saw a statistically significant performance deficit in randomly wired networks compared to their standard counterparts, with 14 cases showing statistically significant improvement. We also find no significant difference in prediction speed between networks with standard feedforward dense layers and those with randomly wired layers. These findings indicate that randomly wired neural networks may be suitable direct replacements for traditional dense layers in many standard models.

artificial intelligence, machine learning, randdense network, (17 more...)

arXiv.org Artificial Intelligence

2212.03369

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
South America (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (0.87)
Research Report > New Finding (0.66)

Industry:

Energy (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are Message Passing Neural Networks Really Helpful for Knowledge Graph Completion?

Li, Juanhui, Shomer, Harry, Ding, Jiayuan, Wang, Yiqi, Ma, Yao, Shah, Neil, Tang, Jiliang, Yin, Dawei

arXiv.org Artificial IntelligenceJul-3-2023

Knowledge graphs (KGs) facilitate a wide variety of applications. Despite great efforts in creation and maintenance, even the largest KGs are far from complete. Hence, KG completion (KGC) has become one of the most crucial tasks for KG research. Recently, considerable literature in this space has centered around the use of Message Passing (Graph) Neural Networks (MPNNs), to learn powerful embeddings. The success of these methods is naturally attributed to the use of MPNNs over simpler multi-layer perceptron (MLP) models, given their additional message passing (MP) component. In this work, we find that surprisingly, simple MLP models are able to achieve comparable performance to MPNNs, suggesting that MP may not be as crucial as previously believed. With further exploration, we show careful scoring function and loss function design has a much stronger influence on KGC model performance. This suggests a conflation of scoring function design, loss function design, and MP in prior work, with promising insights regarding the scalability of state-of-the-art KGC methods today, as well as careful attention to more suitable MP designs for KGC tasks tomorrow. Our codes are publicly available at: https://github.com/Juanhui28/Are_MPNNs_helpful.

artificial intelligence, machine learning, triplet, (18 more...)

arXiv.org Artificial Intelligence

2205.10652

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Nevada (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Tractability from overparametrization: The example of the negative perceptron

Montanari, Andrea, Zhong, Yiqiao, Zhou, Kangjie

arXiv.org Artificial IntelligenceJul-3-2023

In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i,y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector ${\boldsymbol \theta}$ that maximizes $\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which $n,d\to \infty$ with $n/d\to\delta$, and prove upper and lower bounds on the maximum margin $\kappa_{\text{s}}(\delta)$ or -- equivalently -- on its inverse function $\delta_{\text{s}}(\kappa)$. In other words, $\delta_{\text{s}}(\kappa)$ is the overparametrization threshold: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier achieving vanishing training error exists with high probability, while for $n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$ it does not. Our bounds on $\delta_{\text{s}}(\kappa)$ match to the leading order as $\kappa\to -\infty$. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold $\delta_{\text{lin}}(\kappa)$. We observe a gap between the interpolation threshold $\delta_{\text{s}}(\kappa)$ and the linear programming threshold $\delta_{\text{lin}}(\kappa)$, raising the question of the behavior of other algorithms.

artificial intelligence, exp, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2110.15824

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.70)

Add feedback

Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting

Tran, Nhat Thanh, Xin, Jack

arXiv.org Artificial IntelligenceJul-2-2023

Recent progress in long sequence time-series forecasting (LSTF) has been led by either transformers with sparse attention ([16] and references therein) or attention in combination with signal preprocessing such as seasonal-trend decomposition [17] or adopting auto-correlation to account for periodicity in the data [13]. On the other hand, Fourier transform has been proposed as an alternative mixing tool in lieu of standard attention [12] to speed up prediction in natural language processing (NLP) tasks (FNet, [2]). Though Fourier transform is meant to mimic the mixing functions of multilayer perceptron(MLP,[11]), it is not well-understood why it works and when assistance from attention layers remain necessary to maintain performance. In computer vision (CV), Fourier transform is also used as a filtering step in early stages of transformer (GFNet,[8]) to enhance a fully attention-based architecture. A recent advance in CV is to adopt window attention to reduce quadratic complexity of full attention [12].

data quality, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.00493

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)
(2 more...)

Add feedback

A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact

Huertas-García, Álvaro, Martí-González, Carlos, Maezo, Rubén García, Rey, Alejandro Echeverría

arXiv.org Artificial IntelligenceJul-1-2023

In the context of Industry 4.0, the use of artificial intelligence (AI) and machine learning for anomaly detection is being hampered by high computational requirements and associated environmental effects. This study seeks to address the demands of high-performance machine learning models with environmental sustainability, contributing to the emerging discourse on 'Green AI.' An extensive variety of machine learning algorithms, coupled with various Multilayer Perceptron (MLP) configurations, were meticulously evaluated. Our investigation encapsulated a comprehensive suite of evaluation metrics, comprising Accuracy, Area Under the Curve (AUC), Recall, Precision, F1 Score, Kappa Statistic, Matthews Correlation Coefficient (MCC), and F1 Macro. Simultaneously, the environmental footprint of these models was gauged through considerations of time duration, CO2 equivalent, and energy consumption during the training, cross-validation, and inference phases. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised MLP configurations, albeit with a commensurate increase in resource consumption. The study incorporated a multi-objective optimisation approach, invoking Pareto optimality principles, to highlight the trade-offs between a model's performance and its environmental impact. The insights derived underscore the imperative of striking a balance between model performance, complexity, and environmental implications, thus offering valuable directions for future work in the development of environmentally conscious machine learning models for industrial applications.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.00361

Country:

North America > United States (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Energy (1.00)
Law > Environmental Law (0.63)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Classification and Explanation of Distributed Denial-of-Service (DDoS) Attack Detection using Machine Learning and Shapley Additive Explanation (SHAP) Methods

Wei, Yuanyuan, Jang-Jaccard, Julian, Singh, Amardeep, Sabrina, Fariza, Camtepe, Seyit

arXiv.org Artificial IntelligenceJun-27-2023

DDoS attacks involve overwhelming a target system with a large number of requests or traffic from multiple sources, disrupting the normal traffic of a targeted server, service, or network. Distinguishing between legitimate traffic and malicious traffic is a challenging task. It is possible to classify legitimate traffic and malicious traffic and analysis the network traffic by using machine learning and deep learning techniques. However, an inter-model explanation implemented to classify a traffic flow whether is benign or malicious is an important investigation of the inner working theory of the model to increase the trustworthiness of the model. Explainable Artificial Intelligence (XAI) can explain the decision-making of the machine learning models that can be classified and identify DDoS traffic. In this context, we proposed a framework that can not only classify legitimate traffic and malicious traffic of DDoS attacks but also use SHAP to explain the decision-making of the classifier model. To address this concern, we first adopt feature selection techniques to select the top 20 important features based on feature importance techniques (e.g., XGB-based SHAP feature importance). Following that, the Multi-layer Perceptron Network (MLP) part of our proposed model uses the optimized features of the DDoS attack dataset as inputs to classify legitimate and malicious traffic. We perform extensive experiments with all features and selected features. The evaluation results show that the model performance with selected features achieves above 99\% accuracy. Finally, to provide interpretability, XAI can be adopted to explain the model performance between the prediction results and features based on global and local explanations by SHAP, which can better explain the results achieved by our proposed framework.

artificial intelligence, explanation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.1719

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Oceania > Australia > Queensland (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.86)

Add feedback

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Wu, Jingfeng, Zou, Difan, Chen, Zixiang, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

arXiv.org Artificial IntelligenceJun-26-2023

This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.

finite-sample analysis, glm-tron, regression, (12 more...)

arXiv.org Artificial Intelligence

2303.02255

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.34)

Add feedback