AITopics | Perceptrons

Collaborating Authors

Perceptrons

News Overviews Instructional Materials AI-Alerts Classics

From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks

Brändle, Sven, Aczel, Till, Plesner, Andreas, Wattenhofer, Roger

arXiv.org Artificial IntelligenceOct-1-2025

Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes. Figure 1: DLGNs (blue) consistently outperform MLPs (red) across classification tasks with up to 2000 classes. The result illustrates the potential of logic-gate-based architectures to remain effective when applied to large-scale classification problems. Deep artificial neural networks have improved immensely in the last few years, exhibiting impressive performance across a wide range of tasks (Golroudbari & Sabour, 2023; Noor & Ige, 2024; Ekun-dayo & Ezugwu, 2025). However, these improvements come with rapidly growing computational costs (Thompson et al., 2020; Rosenfeld, 2021; Tripp et al., 2024). This constrains their deployment in many real-world environments, particularly on edge devices and mobile phones (Zhang et al., 2020; Zheng, 2025).

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.25933

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.48)

Add feedback

LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts

Zhuang, Yuan, Shen, Yi, Bian, Yuexin, Su, Qing, Ji, Shihao, Shi, Yuanyuan, Miao, Fei

arXiv.org Artificial IntelligenceOct-1-2025

Recent studies have shown that combining parameter-efficient fine-tuning (PEFT) with mixture-of-experts (MoE) is an effective strategy for adapting large language models (LLMs) to the downstream tasks. However, most existing approaches rely on conventional TopK routing, which requires careful hyperparameter tuning and assigns a fixed number of experts to each token. In this work, we propose LD-MoLE, a Learnable Dynamic routing mechanism for Mixture of LoRA Experts that enables adaptive, token-dependent, and layer-wise expert allocation. Our method replaces the non-differentiable TopK selection with a differentiable routing function and a closed-form solution. Moreover, our design allows the model to adaptively determine the number of experts to activate for each token at different layers. In addition, we introduce an analytical sparsity control objective to regularize the number of activated experts. Our method not only achieves superior performance, but also demonstrates the ability to learn token-dependent and layer-wise expert allocation. Large language models (LLMs) have demonstrated impressive capabilities across a wide range of natural language processing (NLP) tasks. However, their growing size requires significant computational resources for full-parameter fine-tuning. To address this, Parameter-Efficient Fine-tuning (PEFT) methods, such as Adapter-tuning (Houlsby et al., 2019) and LoRA (Hu et al., 2021), have emerged as crucial techniques for reducing training costs. Recently, the Mixture-of-Experts (MoE) design (Jacobs et al., 1991; Shazeer et al., 2017) has been successfully integrated into transformer feed-forward networks during LLMs pretraining (Dai et al., 2024; Y ang et al., 2025), demonstrating that MoE can reduce computational cost while maintaining strong performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.25684

Country: North America > United States (0.46)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.48)

Add feedback

Learning Stochastic Feedforward Neural Networks

Neural Information Processing SystemsSep-30-2025, 12:54:14 GMT

Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks. As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. However, this predictive distribution is assumed to be unimodal (e.g. For tasks such as structured prediction problems, the conditional distribution should be multimodal, forming one-to-many mappings. By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space. However, previously proposed learning algorithms for SBNs are very slow and do not work well for real-valued data.

conditional distribution, learning stochastic feedforward neural network

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.63)

Add feedback

T-MLP: Tailed Multi-Layer Perceptron for Level-of-Detail Signal Representation

Yang, Chuanxiang, Zhou, Yuanfeng, Wei, Guangshun, Ren, Siyu, Liu, Yuan, Hou, Junhui, Wang, Wenping

arXiv.org Artificial IntelligenceSep-30-2025

Level-of-detail (LoD) representation is critical for efficiently modeling and transmitting various types of signals, such as images and 3D shapes. In this work, we propose a novel network architecture that enables LoD signal representation. Our approach builds on a modified Multi-Layer Perceptron (MLP), which inherently operates at a single scale and thus lacks native LoD support. Specifically, we introduce the Tailed Multi-Layer Perceptron (T -MLP), which extends the MLP by attaching an output branch, also called tail, to each hidden layer. Each tail refines the residual between the current prediction and the ground-truth signal, so that the accumulated outputs across layers correspond to the target signals at different LoDs, enabling multi-scale modeling with supervision from only a single-resolution signal. Extensive experiments demonstrate that our T -MLP outperforms existing neural LoD baselines across diverse signal representation tasks. Representing signals with neural networks is an active research direction, known as implicit neural representation (INR) (Sun et al., 2022; Molaei et al., 2023; Essakine et al., 2024). Unlike traditional discrete signal representation that stores signal values on a fixed-size grid, INR represents a continuous mapping from coordinates to signal values using a neural network, offering a more compact representation than conventional discrete grid-based representations.

artificial intelligence, machine learning, representation, (11 more...)

arXiv.org Artificial Intelligence

2509.00066

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints

Gokhale, Shreyas

arXiv.org Artificial IntelligenceSep-30-2025

Learning high-quality, robust, efficient, and disentangled representations is a central challenge in artificial intelligence (AI). Deep metric learning frameworks tackle this challenge primarily using architectural and optimization constraints. Here, we introduce a third approach that instead relies on $\textit{functional}$ constraints. Specifically, we present MonoCon, a simple framework that uses a small monotonic multi-layer perceptron (MLP) head attached to any pre-trained encoder. Due to co-adaptation between encoder and head guided by contrastive loss and monotonicity constraints, MonoCon learns robust, disentangled, and highly compact embeddings at a practically negligible performance cost. On the CIFAR-100 image classification task, MonoCon yields representations that are nearly 9x more compact and 1.5x more robust than the fine-tuned encoder baseline, while retaining 99\% of the baseline's 5-NN classification accuracy. We also report a 3.4x more compact and 1.4x more robust representation on an SNLI sentence similarity task for a marginal reduction in the STSb score, establishing MonoCon as a general domain-agnostic framework. Crucially, these robust, ultra-compact representations learned via functional constraints offer a unified solution to critical challenges in disparate contexts ranging from edge computing to cloud-scale retrieval.

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Artificial Intelligence

2509.22931

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Marching Neurons: Accurate Surface Extraction for Neural Implicit Shapes

Stippel, Christian, Mujkanovic, Felix, Leimkühler, Thomas, Hermosilla, Pedro

arXiv.org Artificial IntelligenceSep-26-2025

Accurate surface geometry representation is crucial in 3D visual computing. Explicit representations, such as polygonal meshes, and implicit representations, like signed distance functions, each have distinct advantages, making efficient conversions between them increasingly important. Conventional surface extraction methods for implicit representations, such as the widely used Marching Cubes algorithm, rely on spatial decomposition and sampling, leading to inaccuracies due to fixed and limited resolution. We introduce a novel approach for analytically extracting surfaces from neural implicit functions. Our method operates natively in parallel and can navigate large neural architectures. By leveraging the fact that each neuron partitions the domain, we develop a depth-first traversal strategy to efficiently track the encoded surface. The resulting meshes faithfully capture the full geometric information from the network without ad-hoc spatial discretization, achieving unprecedented accuracy across diverse shapes and network architectures while maintaining competitive speed.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2509.21007

Country: Europe (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Materials > Metals & Mining (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.46)

Add feedback

PGCLODA: Prompt-Guided Graph Contrastive Learning for Oligopeptide-Infectious Disease Association Prediction

Tan, Dayu, Chen, Jing, Zhou, Xiaoping, Su, Yansen, Zheng, Chunhou

arXiv.org Artificial IntelligenceSep-25-2025

Infectious diseases continue to pose a serious threat to public health, underscoring the urgent need for effective computational approaches to screen novel anti-infective agents. Oligopeptides have emerged as promising candidates in antimicrobial research due to their structural simplicity, high bioavailability, and low susceptibility to resistance. Despite their potential, computational models specifically designed to predict associations between oligopeptides and infectious diseases remain scarce. This study introduces a prompt-guided graph-based contrastive learning framework (PGCLODA) to uncover potential associations. A tripartite graph is constructed with oligopeptides, microbes, and diseases as nodes, incorporating both structural and semantic information. To preserve critical regions during contrastive learning, a prompt-guided graph augmentation strategy is employed to generate meaningful paired views. A dual encoder architecture, integrating Graph Convolutional Network (GCN) and Transformer, is used to jointly capture local and global features. The fused embeddings are subsequently input into a multilayer perceptron (MLP) classifier for final prediction. Experimental results on a benchmark dataset indicate that PGCLODA consistently outperforms state-of-the-art models in AUROC, AUPRC, and accuracy. Ablation and hyperparameter studies confirm the contribution of each module. Case studies further validate the generalization ability of PGCLODA and its potential to uncover novel, biologically relevant associations. These findings offer valuable insights for mechanism-driven discovery and oligopeptide-based drug development. The source code of PGCLODA is available online at https://github.com/jjnlcode/PGCLODA.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.2029

Country:

North America (0.28)
Asia > China > Anhui Province (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback

Efficient Online Large-Margin Classification via Dual Certificates

Ho-Nguyen, Nam, Kılınç-Karzan, Fatma, Nguyen, Ellie, Shen, Lingqing

arXiv.org Artificial IntelligenceSep-25-2025

Online classification is a central problem in optimization, statistical learning and data science. Classical algorithms such as the perceptron offer efficient updates and finite mistake guarantees on linearly separable data, but they do not exploit the underlying geometric structure of the classification problem. We study the offline maximum margin problem through its dual formulation and use the resulting geometric insights to design a principled and efficient algorithm for the online setting. A key feature of our method is its translation invariance, inherited from the offline formulation, which plays a central role in its performance analysis. Our theoretical analysis yields improved mistake and margin bounds that depend only on translation-invariant quantities, offering stronger guarantees than existing algorithms under the same assumptions in favorable settings. In particular, we identify a parameter regime where our algorithm makes at most two mistakes per sequence, whereas the perceptron can be forced to make arbitrarily many mistakes. Our numerical study on real data further demonstrates that our method matches the computational efficiency of existing online algorithms, while significantly outperforming them in accuracy.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.1967

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.90)

Add feedback

142cdba4b8d1e03f9ee131ac86bb0afc-Paper-Conference.pdf

Neural Information Processing SystemsSep-24-2025, 19:17:03 GMT

arxiv preprint arxiv, dynamic model, neural network, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.93)

Industry: Energy (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Prediction of Coffee Ratings Based On Influential Attributes Using SelectKBest and Optimal Hyperparameters

Agyemang, Edmund, Agbota, Lawrence, Agbenyeavu, Vincent, Akabuah, Peggy, Bimpong, Bismark, Attafuah, Christopher

arXiv.org Artificial IntelligenceSep-24-2025

This study explores the application of supervised machine learning algorithms to predict coffee ratings based on a combination of influential textual and numerical attributes extracted from user reviews. Through careful data preprocessing including text cleaning, feature extraction using TF-IDF, and selection with SelectKBest, the study identifies key factors contributing to coffee quality assessments. Six models (Decision Tree, KNearest Neighbors, Multi-layer Perceptron, Random Forest, Extra Trees, and XGBoost) were trained and evaluated using optimized hyperparameters. Model performance was assessed primarily using F1-score, Gmean, and AUC metrics. Results demonstrate that ensemble methods (Extra Trees, Random Forest, and XGBoost), as well as Multi-layer Perceptron, consistently outperform simpler classifiers (Decision Trees and K-Nearest Neighbors) in terms of evaluation metrics such as F1 scores, G-mean and AUC. The findings highlight the essence of rigorous feature selection and hyperparameter tuning in building robust predictive systems for sensory product evaluation, offering a data driven approach to complement traditional coffee cupping by expertise of trained professionals.

artificial intelligence, machine learning, random forest, (14 more...)

arXiv.org Artificial Intelligence

2509.18124

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine (0.68)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback