AITopics

Industry: Energy (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)

Neural Information Processing SystemsDec-23-2025, 21:38:13 GMT

BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons

This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs). We provide extensive analysis on the difficulty of binarizing vision MLPs and find that previous binarization methods perform poorly due to limited capacity of binary MLPs. In contrast with the traditional CNNs that utilizing convolutional operations with large kernel size, fully-connected (FC) layers in MLPs can be treated as convolutional layers with kernel size $1\times1$. Thus, the representation ability of the FC layers will be limited when being binarized, and places restrictions on the capability of spatial mixing and channel mixing on the intermediate features. To this end, we propose to improve the performance of binary MLP (BiMLP) model by enriching the representation ability of binary FC layers. We design a novel binary block that contains multiple branches to merge a series of outputs from the same stage, and also a universal shortcut connection that encourages the information flow from the previous stage. The downsampling layers are also carefully designed to reduce the computational complexity while maintaining the classification performance.

bimlp, compact binary architectures, vision multi-layer perceptron, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.65)

Neural Information Processing SystemsDec-23-2025, 20:43:03 GMT

HSurf-Net: Normal Estimation for 3D Point Clouds by Learning Hyper Surfaces

We propose a novel normal estimation method called HSurf-Net, which can accurately predict normals from point clouds with noise and density variations. Previous methods focus on learning point weights to fit neighborhoods into a geometric surface approximated by a polynomial function with a predefined order, based on which normals are estimated. However, fitting surfaces explicitly from raw point clouds suffers from overfitting or underfitting issues caused by inappropriate polynomial orders and outliers, which significantly limits the performance of existing methods. To address these issues, we introduce hyper surface fitting to implicitly learn hyper surfaces, which are represented by multi-layer perceptron (MLP) layers that take point features as input and output surface patterns in a high dimensional feature space. We introduce a novel space transformation module, which consists of a sequence of local aggregation layers and global shift layers, to learn an optimal feature space, and a relative position encoding module to effectively convert point clouds into the learned feature space. Our model learns hyper surfaces from the noise-less features and directly predicts normal vectors. We jointly optimize the MLP weights and module parameters in a data-driven manner to make the model adaptively find the most suitable surface pattern for various points. Experimental results show that our HSurf-Net achieves the state-of-the-art performance on the synthetic shape dataset, the real-world indoor and outdoor scene datasets. The code, data and pretrained models are publicly available.

hsurf-net, normal estimation, point cloud, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)

Neural Information Processing SystemsDec-23-2025, 17:37:42 GMT

Global Filter Networks for Image Classification

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases. These models are generally based on learning interaction among spatial locations from raw data. The complexity of self-attention and MLP grows quadratically as the image size increases, which makes these models hard to scale up when high-resolution features are required. In this paper, we present the Global Filter Network (GFNet), a conceptually simple yet computationally efficient architecture, that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform.

global filter network, image classification, name change, (6 more...)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.60)

Barbier, Jean, Camilli, Francesco, Nguyen, Minh-Toan, Pastore, Mauro, Skerk, Rudy

Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation

arXiv.org Machine LearningDec-15-2025

For four decades statistical physics has been providing a framework to analyse neural networks. A long-standing question remained on its capacity to tackle deep learning models capturing rich feature learning effects, thus going beyond the narrow networks or kernel methods analysed until now. We positively answer through the study of the supervised learning of a multi-layer perceptron. Importantly, (i) its width scales as the input dimension, making it more prone to feature learning than ultra wide networks, and more expressive than narrow ones or ones with fixed embedding layers; and (ii) we focus on the challenging interpolation regime where the number of trainable parameters and data are comparable, which forces the model to adapt to the task. We consider the matched teacher-student setting. Therefore, we provide the fundamental limits of learning random deep neural network targets and identify the sufficient statistics describing what is learnt by an optimally trained network as the data budget increases. A rich phenomenology emerges with various learning transitions. With enough data, optimal performance is attained through the model's "specialisation" towards the target, but it can be hard to reach for training algorithms which get attracted by sub-optimal solutions predicted by the theory. Specialisation occurs inhomogeneously across layers, propagating from shallow towards deep ones, but also across neurons in each layer. Furthermore, deeper targets are harder to learn. Despite its simplicity, the Bayes-optimal setting provides insights on how the depth, non-linearity and finite (proportional) width influence neural networks in the feature learning regime that are potentially relevant in much more general settings.

matrix, neural network, readout, (16 more...)

arXiv.org Machine Learning

2510.24616

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
(5 more...)

Genre:

Research Report (0.81)
Workflow (0.67)
Instructional Material (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Powell, Brian P., Caraballo-Vega, Jordan A., Carroll, Mark L., Maxwell, Thomas, Ptak, Andrew, Olmschenk, Greg, Martinez-Palomera, Jorge

Extrapolation of Periodic Functions Using Binary Encoding of Continuous Numerical Values

arXiv.org Machine LearningDec-12-2025

We report the discovery that binary encoding allows neural networks to extrapolate periodic functions beyond their training bounds. We introduce Normalized Base-2 Encoding (NB2E) as a method for encoding continuous numerical values and demonstrate that, using this input encoding, vanilla multi-layer perceptrons (MLP) successfully extrapolate diverse periodic signals without prior knowledge of their functional form. Internal activation analysis reveals that NB2E induces bit-phase representations, enabling MLPs to learn and extrapolate signal structure independently of position.

activation, extrapolation, representation, (15 more...)

arXiv.org Machine Learning

2512.10817

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

N, Amogh Anshu, BP, Harish

Artificial Intelligence-Driven Network-on-Chip Design Space Exploration: Neural Network Architectures for Design

arXiv.org Artificial IntelligenceDec-11-2025

Network-on-Chip (NoC) design requires exploring a high-dimensional configuration space to satisfy stringent throughput requirements and latency constraints. Traditional design space exploration techniques are often slow and struggle to handle complex, non-linear parameter interactions. This work presents a machine learning-driven framework that automates NoC design space exploration using BookSim simulations and reverse neural network models. Specifically, we compare three architectures - a Multi-Layer Perceptron (MLP),a Conditional Diffusion Model, and a Conditional Variational Autoencoder (CVAE) to predict optimal NoC parameters given target performance metrics. Our pipeline generates over 150,000 simulation data points across varied mesh topologies. The Conditional Diffusion Model achieved the highest predictive accuracy, attaining a mean squared error (MSE) of 0.463 on unseen data. Furthermore, the proposed framework reduces design exploration time by several orders of magnitude, making it a practical solution for rapid and scalable NoC co-design.

artificial intelligence, machine learning, optimization, (10 more...)

2512.07877

Country:

Asia > India (0.15)
North America > United States (0.14)
Europe > France (0.14)
Asia > Taiwan (0.14)

Genre: Research Report (0.50)

Industry:

Semiconductors & Electronics (0.51)
Telecommunications (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Cheairi, Houssam El, Gamarnik, David, Mazumder, Rahul

Theoretical Compression Bounds for Wide Multilayer Perceptrons

arXiv.org Artificial IntelligenceDec-9-2025

Pruning and quantization techniques have been broadly successful in reducing the number of parameters needed for large neural networks, yet theoretical justification for their empirical success falls short. We consider a randomized greedy compression algorithm for pruning and quantization post-training and use it to rigorously show the existence of pruned/quantized subnetworks of multilayer perceptrons (MLPs) with competitive performance. We further extend our results to structured pruning of MLPs and convolutional neural networks (CNNs), thus providing a unified analysis of pruning in wide networks. Our results are free of data assumptions, and showcase a tradeoff between compressibility and network width. The algorithm we consider bears some similarities with Optimal Brain Damage (OBD) and can be viewed as a post-training randomized version of it. The theoretical results we derive bridge the gap between theory and application for pruning/quantization, and provide a justification for the empirical success of compression in wide multilayer perceptrons.

artificial intelligence, machine learning, pruning, (16 more...)

2512.06288

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Grampurohit, Shreyas Jayant, Mulleti, Satish, Rajwade, Ajit

Verifiable Deep Quantitative Group Testing

arXiv.org Artificial IntelligenceDec-9-2025

We present a neural network-based framework for solving the quantitative group testing (QGT) problem that achieves both high decoding accuracy and structural verifiability. In QGT, the objective is to identify a small subset of defective items among $N$ candidates using only $M \ll N$ pooled tests, each reporting the number of defectives in the tested subset. We train a multi-layer perceptron to map noisy measurement vectors to binary defect indicators, achieving accurate and robust recovery even under sparse, bounded perturbations. Beyond accuracy, we show that the trained network implicitly learns the underlying pooling structure that links items to tests, allowing this structure to be recovered directly from the network's Jacobian. This indicates that the model does not merely memorize training patterns but internalizes the true combinatorial relationships governing QGT. Our findings reveal that standard feedforward architectures can learn verifiable inverse mappings in structured combinatorial recovery problems.

artificial intelligence, group testing, machine learning, (17 more...)

2512.07279

Country: Asia > India (0.15)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Karn, Isha, Jensen, David

The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News

arXiv.org Artificial IntelligenceDec-9-2025

Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure. Specifically, these data sets exhibit shallow, ego-like graph topologies that provide little or no ability to differentiate among modeling methods. We systematically benchmark five GNN architectures against a structure-agnostic multilayer perceptron (MLP) that uses the same node features. We show that MLPs match or closely trail the performance of GNNs, with performance gaps often within 1-2% and overlapping confidence intervals. To isolate the contribution of structure in these datasets, we conduct controlled experiments where node features are shuffled or edge structures randomized. We find that performance collapses under feature shuffling but remains stable under edge randomization. This suggests that structure plays a negligible role in these benchmarks. Structural analysis further reveals that over 75% of nodes are only one hop from the root, exhibiting minimal structural diversity. In contrast, on synthetic datasets where node features are noisy and structure is informative, GNNs significantly outperform MLPs. These findings provide strong evidence that widely used benchmarks do not meaningfully test the utility of modeling structural features, and they motivate the development of datasets with richer, more diverse graph topologies.

artificial intelligence, machine learning, node feature, (18 more...)

2512.06638

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Media > News (0.76)
Health & Medicine (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)