AITopics | Perceptrons

Collaborating Authors

Perceptrons

News Overviews Instructional Materials AI-Alerts Classics

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Neo, Clement, Cohen, Shay B., Barez, Fazl

arXiv.org Artificial IntelligenceFeb-22-2024

In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens. By prompting an LLM like GPT-4 to explain these model internals, we can elucidate attention mechanisms that activate certain next-token neurons. Our analysis identifies attention heads that recognize contexts relevant to predicting a particular token, activating the associated neuron through the residual connection. We focus specifically on heads in earlier layers consistently activating the same next-token neuron across similar prompts. Exploring these differential activation patterns reveals that heads that specialize for distinct linguistic contexts are tied to generating certain tokens. Overall, our method combines neural explanations and probing isolated components to illuminate how attention enables context-dependent, specialized processing in LLMs.

attention head, explanation, neuron, (13 more...)

arXiv.org Artificial Intelligence

2402.15055

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > South Dakota (0.04)
North America > United States > Oklahoma (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Zhang, Ruiqi, Wu, Jingfeng, Bartlett, Peter L.

arXiv.org Machine LearningFeb-22-2024

W e study the in-context learning (ICL) ability of a Linear Transformer Block (L TB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component. For ICL of linear regression with a Gaussian prior and a nonzero mean, we show that L TB can achieve nearly Bayes optimal ICL risk. In contrast, using only linear attention must incur an irreducible additive approximation error. Furthermore, we establish a correspondence between L TB and one-step gradient descent estimators with learnable initialization ( GD- β), in the sense that every GD- β estimator can be implemented by an L TB estimator and every optimal L TB estimator that minimizes the in-class ICL risk is effectively a GD- β estimator. Finally, we show that GD- β estimators can be efficiently optimized with gradient flow, despite a non-convex training objective. Our results reveal that L TB achieves ICL by implementing GD- β, and they highlight the role of MLP layers in reducing approximation error.

icl risk, in-context learning, matrix, (12 more...)

arXiv.org Machine Learning

2402.14951

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Denmark (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

An Effective Networks Intrusion Detection Approach Based on Hybrid Harris Hawks and Multi-Layer Perceptron

Alazab, Moutaz, Khurma, Ruba Abu, Castillo, Pedro A., Abu-Salih, Bilal, Martin, Alejandro, Camacho, David

arXiv.org Artificial IntelligenceFeb-21-2024

This paper proposes an Intrusion Detection System (IDS) employing the Harris Hawks Optimization algorithm (HHO) to optimize Multilayer Perceptron learning by optimizing bias and weight parameters. HHO-MLP aims to select optimal parameters in its learning process to minimize intrusion detection errors in networks. HHO-MLP has been implemented using EvoloPy NN framework, an open-source Python tool specialized for training MLPs using evolutionary algorithms. For purposes of comparing the HHO model against other evolutionary methodologies currently available, specificity and sensitivity measures, accuracy measures, and mse and rmse measures have been calculated using KDD datasets. Experiments have demonstrated the HHO MLP method is effective at identifying malicious patterns. HHO-MLP has been tested against evolutionary algorithms like Butterfly Optimization Algorithm (BOA), Grasshopper Optimization Algorithms (GOA), and Black Widow Optimizations (BOW), with validation by Random Forest (RF), XG-Boost. HHO-MLP showed superior performance by attaining top scores with accuracy rate of 93.17%, sensitivity level of 89.25%, and specificity percentage of 95.41%.

algorithm, intrusion detection system, iter, (13 more...)

arXiv.org Artificial Intelligence

2402.14037

Country:

Asia > Middle East > Jordan > Amman Governorate > Amman (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Africa > Mali (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

BMLP: Behavior-aware MLP for Heterogeneous Sequential Recommendation

Li, Weixin, Wu, Yuhao, Liu, Yang, Pan, Weike, Ming, Zhong

arXiv.org Artificial IntelligenceFeb-20-2024

In real recommendation scenarios, users often have different types of behaviors, such as clicking and buying. Existing research methods show that it is possible to capture the heterogeneous interests of users through different types of behaviors. However, most multi-behavior approaches have limitations in learning the relationship between different behaviors. In this paper, we propose a novel multilayer perceptron (MLP)-based heterogeneous sequential recommendation method, namely behavior-aware multilayer perceptron (BMLP). Specifically, it has two main modules, including a heterogeneous interest perception (HIP) module, which models behaviors at multiple granularities through behavior types and transition relationships, and a purchase intent perception (PIP) module, which adaptively fuses subsequences of auxiliary behaviors to capture users' purchase intent. Compared with mainstream sequence models, MLP is competitive in terms of accuracy and has unique advantages in simplicity and efficiency. Extensive experiments show that BMLP achieves significant improvement over state-of-the-art algorithms on four public datasets. In addition, its pure MLP architecture leads to a linear time complexity.

module, recommendation, sequence, (16 more...)

arXiv.org Artificial Intelligence

2402.12733

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Hong Kong > Kowloon (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.94)

Add feedback

The Capacity of a Bump

Neural Information Processing SystemsFeb-17-2024, 08:05:37 GMT

Recently, several researchers have reported encouraging experimental re(cid:173) sults when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical. However, we show that for p d, which is the usual case in practice, the ratio of hyper-ridge to perceptron dichotomies approaches pl2(d 1).

activation function, bump, perceptron

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Enhancing ESG Impact Type Identification through Early Fusion and Multilingual Models

Veeramani, Hariram, Thapa, Surendrabikram, Naseem, Usman

arXiv.org Artificial IntelligenceFeb-16-2024

In the evolving landscape of Environmental, Social, and Corporate Governance (ESG) impact assessment, the ML-ESG-2 shared task proposes identifying ESG impact types. To address this challenge, we present a comprehensive system leveraging ensemble learning techniques, capitalizing on early and late fusion approaches. Our approach employs four distinct models: mBERT, FlauBERT-base, ALBERT-base-v2, and a Multi-Layer Perceptron (MLP) incorporating Latent Semantic Analysis (LSA) and Term Frequency-Inverse Document Frequency (TF-IDF) features. Through extensive experimentation, we find that our early fusion ensemble approach, featuring the integration of LSA, TF-IDF, mBERT, FlauBERT-base, and ALBERT-base-v2, delivers the best performance. Our system offers a comprehensive ESG impact type identification solution, contributing to the responsible and sustainable decision-making processes vital in today's financial and corporate governance landscape.

identification, proceedings, type identification, (15 more...)

arXiv.org Artificial Intelligence

2402.10772

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.05)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
(3 more...)

Genre: Research Report (0.64)

Industry:

Banking & Finance (1.00)
Law > Business Law (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.71)

Add feedback

Graph Inference Acceleration by Learning MLPs on Graphs without Supervision

Wang, Zehong, Zhang, Zheyuan, Zhang, Chuxu, Ye, Yanfang

arXiv.org Artificial IntelligenceFeb-13-2024

Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, we comprehensively analyze \textsc{SimMLP} to demonstrate its equivalence to GNNs in the optimal case and its generalization capability. Empirically, \textsc{SimMLP} outperforms state-of-the-art baselines, especially in settings with unseen nodes. In particular, it obtains significant performance gains {\bf (7$\sim$26\%)} over MLPs and inference acceleration over GNNs {\bf (90$\sim$126$\times$)} on large-scale graph datasets. Our codes are available at: \url{https://github.com/Zehong-Wang/SimMLP}.

gnn, im mlp, mlp, (12 more...)

arXiv.org Artificial Intelligence

2402.08918

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)

Genre: Research Report (1.00)

Industry: Law Enforcement & Public Safety > Fraud (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Approximation of relation functions and attention mechanisms

Altabaa, Awni, Lafferty, John

arXiv.org Machine LearningFeb-13-2024

Inner products of neural network feature maps arises in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.

inner product, neural network, relation, (13 more...)

arXiv.org Machine Learning

2402.08856

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Xiong, Zheng, Vuorio, Risto, Beck, Jacob, Zimmer, Matthieu, Shao, Kun, Whiteson, Shimon

arXiv.org Artificial IntelligenceFeb-9-2024

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.

architecture, hyperdistill, robot, (13 more...)

arXiv.org Artificial Intelligence

2402.0657

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

Agarwal, Saurav, Muthukrishnan, Ramya, Gosrich, Walker, Kumar, Vijay, Ribeiro, Alejandro

arXiv.org Artificial IntelligenceFeb-8-2024

Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolution neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multi-layer perceptron (MLP) computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models -- trained using imitation learning -- outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior.

algorithm, architecture, robot, (15 more...)

arXiv.org Artificial Intelligence

2401.04855

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback