AITopics | Perceptrons

Collaborating Authors

Perceptrons

News Overviews Instructional Materials AI-Alerts Classics

Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies

Green, Riku, Day, Huw, Abdallah, Zahraa S., Filho, Telmo M. Silva

arXiv.org Artificial IntelligenceNov-17-2025

Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias and high variance. We revisit this belief by decomposing the expected multi-step forecast error into three parts: irreducible noise, a structural approximation gap, and an estimation-variance term. For linear predictors we show that the structural gap is identically zero for any dataset. For nonlinear predictors, however, the repeated composition used in recursion can increase model expressivity, making the structural gap depend on both the model and the data. We further show that the estimation variance of the recursive strategy at any horizon can be written as the one-step variance multiplied by a Jacobian-based amplification factor that measures how sensitive the composed predictor is to parameter error. This perspective explains when recursive forecasting may simultaneously have lower bias and higher variance than direct forecasting. Experiments with multilayer perceptrons on the ETTm1 dataset confirm these findings. The results offer practical guidance for choosing between recursive and direct strategies based on model nonlinearity and noise characteristics, rather than relying on traditional bias-variance intuition.

artificial intelligence, machine learning, predictor, (17 more...)

arXiv.org Artificial Intelligence

2511.11461

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.53)

Add feedback

Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation

He, Peng, Liu, Yao, Gan, Yanglei, Lin, Run, Dai, Tingting, Liu, Qiao, Li, Xuexin

arXiv.org Artificial IntelligenceNov-17-2025

Sequential recommendation (SR) aims to predict a user's next item preference by modeling historical interaction sequences. Recent advances often integrate frequency-domain modules to compensate for self-attention's low-pass nature by restoring the high-frequency signals critical for personalized recommendations. Nevertheless, existing frequency-aware solutions process each session in isolation and optimize exclusively with time-domain objectives. Consequently, they overlook cross-session spectral dependencies and fail to enforce alignment between predicted and actual spectral signatures, leaving valuable frequency information under-exploited. To this end, we propose FreqRec, a Frequency-Enhanced Dual-Path Network for sequential Recommendation that jointly captures inter-session and intra-session behaviors via a learn-able Frequency-domain Multi-layer Perceptron. Moreover, FreqRec is optimized under a composite objective that combines cross entropy with a frequency-domain consistency loss, explicitly aligning predicted and true spectral signatures. Extensive experiments on three benchmarks show that Fre-qRec surpasses strong baselines and remains robust under data sparsity and noisy-log conditions.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.06285

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

NTSFormer: A Self-Teaching Graph Transformer for Multimodal Isolated Cold-Start Node Classification

Hu, Jun, He, Yufei, Li, Yuan, Hooi, Bryan, He, Bingsheng

arXiv.org Artificial IntelligenceNov-17-2025

Isolated cold-start node classification on multimodal graphs is challenging because such nodes have no edges and often have missing modalities (e.g., absent text or image features). Existing methods address structural isolation by degrading graph learning models to multilayer perceptrons (MLPs) for isolated cold-start inference, using a teacher model (with graph access) to guide the MLP. However, this results in limited model capacity in the student, which is further challenged when modalities are missing. In this paper, we propose Neighbor-to-Self Graph Transformer (NTSFormer), a unified Graph Transformer framework that jointly tackles the isolation and missing-modality issues via a self-teaching paradigm. Specifically, NTSFormer uses a cold-start attention mask to simultaneously make two predictions for each node: a "student" prediction based only on self information (i.e., the node's own features), and a "teacher" prediction incorporating both self and neighbor information. This enables the model to supervise itself without degrading to an MLP, thereby fully leveraging the Transformer's capacity to handle missing modalities. To handle diverse graph information and missing modalities, NTSFormer performs a one-time multimodal graph pre-computation that converts structural and feature data into token sequences, which are then processed by Mixture-of-Experts (MoE) Input Projection and Transformer layers for effective fusion. Experiments on public datasets show that NTSFormer achieves superior performance for multimodal isolated cold-start node classification.

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Artificial Intelligence

2507.0487

Country:

Europe (1.00)
North America > United States (0.94)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

LongComp: Long-Tail Compositional Zero-Shot Generalization for Robust Trajectory Prediction

Stoler, Benjamin, Francis, Jonathan, Oh, Jean

arXiv.org Artificial IntelligenceNov-14-2025

Next, we train autoencoders for ego and social vectors separately. We further split by object type and train independent models for each type, allowing distinct latent spaces to be learned for e.g., pedestrian focal agents versus vehicle focal agents. Each autoencoder consists of a simple encoder and decoder multi-layer perceptron (MLP), with layer normalization and dropout on hidden layers; the encoder maps down to a low-dimensional latent space and the decoder maps back to the original feature space. That is, we compute z = Enc(v) and v = Dec(z). We train the models primarily with a mean-square error (MSE) reconstruction loss between v and v, along with a deep embedding clustering (DEC) [43] loss for regularization on the latent z values. We then obtain discrete ego and social contexts by performing clustering within the latent spaces captured by these autoencoders, using k-means with k = 11. We use the Waymo Open Motion Dataset (WOMD) [15] as a representative source of AD scenarios, sampling approximately 20% of the total data. To quantitatively assess cluster and latent space coherence, we compute silhouette scores on held-out sets [44], observing values ranging from 0.31 to 0.50, which indicates a reasonably well-structured space. We also visualize UMAP [41] projections of the resulting spaces in Figure 2, showing clear separation and evidence of potential sub-clusters.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.10411

Genre: Research Report (0.64)

Industry: Transportation (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data

Sinodinos, Dimitrios, Wei, Jack Yi, Armanfard, Narges

arXiv.org Artificial IntelligenceNov-14-2025

Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datasets grow and span multiple related targets, there is an increasing need to exploit shared task information for improved multitask generalization. Multi-task learning (MTL) has emerged as a powerful way to improve generalization and efficiency, yet most existing work focuses narrowly on large-scale recommendation systems, leaving its potential in broader tabular domains largely un-derexplored. Also, existing MTL approaches for tabular data predominantly rely on multi-layer perceptron-based backbones, which struggle to capture complex feature interactions and often fail to scale when data is abundant, a limitation that transformer architectures have overcome in other domains. Motivated by this, we introduce MultiTab-Net, the first multitask transformer architecture specifically designed for large tabular data. MultiTab-Net employs a novel mul-titask masked-attention mechanism that dynamically models feature-feature dependencies while mitigating task competition. Through extensive experiments, we show that MultiTab-Net consistently achieves higher multitask gain than existing MTL architectures and single-task transformers across diverse domains including large-scale recommendation data, census-like socioeconomic data, and physics datasets, spanning a wide range of task counts, task types, and feature modalities. In addition, we contribute MultiTab-Bench, a generalized multitask synthetic dataset generator that enables systematic evaluation of multitask dynamics by tuning task count, task correlations, and relative task complexity.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.0997

Country: North America > United States (0.68)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Information Technology > Services (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.86)

Add feedback

FlashKAT: Understanding and Addressing Performance Bottlenecks in the Kolmogorov-Arnold Transformer

Raffel, Matthew, Chen, Lizhong

arXiv.org Artificial IntelligenceNov-14-2025

The Kolmogorov-Arnold Network (KAN) has been gaining popularity as an alternative to the multi-layer perceptron (MLP) with its increased expressiveness and interpretability. Even so, the KAN suffers from being orders of magnitude slower due to its increased computational cost and training instability, limiting its applicability to larger-scale tasks. Recently, the Kolmogorov-Arnold Transformer (KAT) has been proposed, which can achieve FLOPs similar to the traditional Transformer with MLPs by leveraging Group-Rational KAN (GR-KAN). Unfortunately, despite the comparable FLOPs, our testing reveals that the KAT is still 123x slower in training speeds, indicating that there are other performance bottlenecks beyond FLOPs. In this paper, we conduct a series of experiments to understand the root cause of the slowdown in KAT. We uncover that the slowdown can be isolated to memory stalls, linked more specifically to inefficient gradient accumulations in the backward pass of GR-KAN. To address this memory bottleneck, we propose FlashKAT, which minimizes accesses to slow memory and the usage of atomic adds through a restructured kernel. Evaluations demonstrate that FlashKAT can achieve a training speedup of 86.5x compared with the state-of-the-art KAT, while reducing rounding errors in the computation of the gradients.

artificial intelligence, backward pass, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.13813

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

ZeroSim: Zero-Shot Analog Circuit Evaluation with Unified Transformer Embeddings

Yang, Xiaomeng, Gao, Jian, Wang, Yanzhi, Zhang, Xuan

arXiv.org Artificial IntelligenceNov-12-2025

Although recent advancements in learning-based analog circuit design automation have tackled tasks such as topology generation, device sizing, and layout synthesis, efficient performance evaluation remains a major bottleneck. Traditional SPICE simulations are time-consuming, while existing machine learning methods often require topology-specific retraining or manual substructure segmentation for fine-tuning, hindering scalability and adaptability. In this work, we propose ZeroSim, a transformer-based performance modeling framework designed to achieve robust in-distribution generalization across trained topologies under novel parameter configurations and zero-shot generalization to unseen topologies without any fine-tuning. We apply three key enabling strategies: (1) a diverse training corpus of 3.6 million instances covering over 60 amplifier topologies, (2) unified topology embeddings leveraging global-aware tokens and hierarchical attention to robustly generalize to novel circuits, and (3) a topology-conditioned parameter mapping approach that maintains consistent structural representations independent of parameter variations. Our experimental results demonstrate that ZeroSim significantly outperforms baseline models such as multilayer perceptrons, graph neural networks and transformers, delivering accurate zero-shot predictions across different amplifier topologies. Additionally, when integrated into a reinforcement learning-based parameter optimization pipeline, ZeroSim achieves a remarkable speedup (13x) compared to conventional SPICE simulations, underscoring its practical value for a wide range of analog circuit design automation tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.07658

Genre: Research Report (0.70)

Industry: Semiconductors & Electronics (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Google-MedGemma Based Abnormality Detection in Musculoskeletal radiographs

Maity, Soumyajit, Kamboj, Pranjal, Maity, Sneha, Singh, Rajat, Chatterjee, Sankhadeep

arXiv.org Artificial IntelligenceNov-11-2025

This paper proposes a MedGemma-based framework for automatic abnormality detection in musculoskeletal radiographs. Departing from conventional autoencoder and neural network pipelines, the proposed method leverages the MedGemma foundation model, incorporating a SigLIP-derived vision encoder pretrained on diverse medical imaging modalities. Preprocessed X-ray images are encoded into high-dimensional embeddings using the MedGemma vision backbone, which are subsequently passed through a lightweight multilayer perceptron for binary classification. Experimental assessment reveals that the MedGemma-driven classifier exhibits strong performance, exceeding conventional convolutional and autoencoder-based metrics. Additionally, the model leverages MedGemma's transfer learning capabilities, enhancing generalization and optimizing feature engineering. The integration of a modern medical foundation model not only enhances representation learning but also facilitates modular training strategies such as selective encoder block unfreezing for efficient domain adaptation. The findings suggest that MedGemma-powered classification systems can advance clinical radiograph triage by providing scalable and accurate abnormality detection, with potential for broader applications in automated medical image analysis. Keywords: Google MedGemma, MURA, Medical Image, Classification.

abnormality detection, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.056

Country: North America > United States > Texas (0.15)

Genre: Research Report (0.70)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

The Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning

Abukhousa, Emad, Afroz, Syed Sohail Feroz Syed, Alsaeed, Fahad, Qwbaiban, Abdulaziz, Zonouz, Saman, Meliopoulos, A. P. Sakis

arXiv.org Artificial IntelligenceNov-11-2025

This paper presents a high-fidelity evaluation framework for machine learning (ML)-based classification of cyber-attacks and physical faults using electromagnetic transient simulations with digital substation emulation at 4.8 kHz. Twelve ML models, including ensemble algorithms and a multi-layer perceptron (MLP), were trained on labeled time-domain measurements and evaluated in a real-time streaming environment designed for sub-cycle responsiveness. The architecture incorporates a cycle-length smoothing filter and confidence threshold to stabilize decisions. Results show that while several models achieved near-perfect offline accuracies (up to 99.9%), only the MLP sustained robust coverage (98-99%) under streaming, whereas ensembles preserved perfect anomaly precision but abstained frequently (10-49% coverage). These findings demonstrate that offline accuracy alone is an unreliable indicator of field readiness and underscore the need for realistic testing and inference pipelines to ensure dependable classification in inverter-based resources (IBR)-rich networks.

artificial intelligence, classification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.06714

Country: North America > United States (0.47)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Power Industry (1.00)
Government > Military > Cyberwarfare (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Bespoke Co-processor for Energy-Efficient Health Monitoring on RISC-V-based Flexible Wearables

Vergos, Theofanis, Vergos, Polykarpos, Tahoori, Mehdi B., Zervakis, Georgios

arXiv.org Artificial IntelligenceNov-11-2025

Flexible electronics offer unique advantages for conformable, lightweight, and disposable healthcare wearables. However, their limited gate count, large feature sizes, and high static power consumption make on-body machine learning classification highly challenging. While existing bendable RISC-V systems provide compact solutions, they lack the energy efficiency required. We present a mechanically flexible RISC-V that integrates a bespoke multiply-accumulate co-processor with fixed coefficients to maximize energy efficiency and minimize latency. Our approach formulates a constrained programming problem to jointly determine co-processor constants and optimally map Multi-Layer Perceptron (MLP) inference operations, enabling compact, model-specific hardware by leveraging the low fabrication and non-recurring engineering costs of flexible technologies. Post-layout results demonstrate near-real-time performance across several healthcare datasets, with our circuits operating within the power budget of existing flexible batteries and occupying only 2.42 mm^2, offering a promising path toward accessible, sustainable, and conformable healthcare wearables. Our microprocessors achieve an average 2.35x speedup and 2.15x lower energy consumption compared to the state of the art.

artificial intelligence, machine learning, multiplier, (19 more...)

arXiv.org Artificial Intelligence

2511.05985

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Health & Medicine > Consumer Health (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback