AITopics

Country:

North America > United States (0.28)
Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsDec-24-2025, 22:14:14 GMT

Understanding and Improving Early Stopping for Learning with Noisy Labels

The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stopping trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stopping point by considering a DNN as a whole. However, a DNN can be considered as a composition of a series of layers, and we find that the latter layers in a DNN are much more sensitive to label noise, while their former counterparts are quite robust. Therefore, selecting a stopping point for the whole network may make different DNN layers antagonistically affect each other, thus degrading the final performance.

early stopping, learning, name change, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Neural Information Processing SystemsOct-8-2025, 07:43:48 GMT

271ec4d1a9ff5e6b81a6e21d38b1ba96-Paper-Conference.pdf

gnn, graph, probability, (15 more...)

Country:

North America > United States (0.28)
Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications (0.67)

arXiv.org Artificial IntelligenceOct-7-2025

On the Limitations and Capabilities of Position Embeddings for Length Generalization

Chen, Yang, Liang, Yitao, Lin, Zhouchen

Abstract--In Transformers, Position Embeddings (PEs) significantly influence Length Generalization (LG) performance, yet their fundamental role remains unclear . In this work, we investigate the limitations and capabilities of PEs in achieving LG. We theoretically analyze PEs in Position-Only Linear Attentions (POLAs), introducing Linear Representation Complexity (LRC) to characterize when PEs enable LG. Our analysis shows that PEs do not expand computational capabilities but structure learned computations across positions. Extending to practical Transformers, we propose Sequential Representation Complexity (SRC) and conjecture that LG is possible if and only if SRC remains invariant across scales. We support this hypothesis with empirical evidence in various reasoning tasks. T o enhance LG, we introduce Scale Hint, allowing flexible instance scaling, and a Learning-Based Position Embedding framework that automatically learns positional relations. Our work provides theoretical insights and practical strategies for improving LG in Transformers. Length Generalization (LG) refers to the ability of a model to extrapolate from small-scale instances to larger ones in reasoning [1]-[4]. In many tasks, the sample space grows exponentially with the problem scale, making exhaustive training infeasible. Thus, it is important to learn from limited training samples at small scales while generalizing to larger ones.

large language model, machine learning, natural language, (19 more...)

2510.0413

Country: Asia > China (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsOct-2-2025, 22:22:02 GMT

Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.

acquisition function, approximation, optimization, (14 more...)

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
Transportation > Air (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Neural Information Processing SystemsSep-30-2025, 09:25:54 GMT

Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.

efficient global optimization, name change, predictive entropy search, (3 more...)

Industry: Transportation > Air (0.44)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.41)
Information Technology > Artificial Intelligence > Machine Learning (0.41)

Lin, Anthony W., Barcelo, Pablo

The Role of Logic and Automata in Understanding Transformers

arXiv.org Artificial IntelligenceSep-30-2025

The advent of transformers has in recent years led to powerful and revolutionary Large Language Models (LLMs). Despite this, our understanding on the capability of transformers is still meager. In this invited contribution, we recount the rapid progress in the last few years to the question of what transformers can do. In particular, we will see the integral role of logic and automata (also with some help from circuit complexity) in answering this question. We also mention several open problems at the intersection of logic, automata, verification and transformers.

large language model, logic & formal reasoning, machine learning, (16 more...)

2509.24024

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)

arXiv.org Artificial IntelligenceSep-3-2025

Anisotropic Fourier Features for Positional Encoding in Medical Imaging

Jabareen, Nabil, Yuan, Dongsheng, Liu, Dingming, Ten, Foo-Wei, Lukassen, Sören

The adoption of Transformer-based architectures in the medical domain is growing rapidly. In medical imaging, the analysis of complex shapes - such as organs, tissues, or other anatomical structures - combined with the often anisotropic nature of high-dimensional images complicates these adaptations. In this study, we critically examine the role of Positional Encodings (PEs), arguing that commonly used approaches may be suboptimal for the specific challenges of medical imaging. Sinusoidal Positional Encodings (SPEs) have proven effective in vision tasks, but they struggle to preserve Euclidean distances in higher-dimensional spaces. Isotropic Fourier Feature Positional Encodings (IFPEs) have been proposed to better preserve Euclidean distances, but they lack the ability to account for anisotropy in images. To address these limitations, we propose Anisotropic Fourier Feature Positional Encoding (AFPE), a generalization of IFPE that incorporates anisotropic, class-specific, and domain-specific spatial dependencies. We systematically benchmark AFPE against commonly used PEs on multi-label classification in chest X-rays, organ classification in CT images, and ejection fraction regression in echocardiography. Our results demonstrate that choosing the correct PE can significantly improve model performance. We show that the optimal PE depends on the shape of the structure of interest and the anisotropy of the data. Finally, our proposed AFPE significantly outperforms state-of-the-art PEs in all tested anisotropic settings. We conclude that, in anisotropic medical images and videos, it is of paramount importance to choose an anisotropic PE that fits the data and the shape of interest.

artificial intelligence, deep learning, machine learning, (18 more...)

2509.02488

Country:

North America (0.28)
Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ogbogu, Chukwufumnanya, Narang, Gaurav, Joardar, Biresh Kumar, Doppa, Janardhan Rao, Chakrabarty, Krishnendu, Pande, Partha Pratim

HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator

arXiv.org Artificial IntelligenceAug-25-2025

Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and SRAM exist, with each device offering unique trade-offs in terms of power, latency, area, and non-idealities. A heterogeneous manycore architecture enabled by 3D integration can combine multiple PIM devices on a single platform, to enable energy-efficient and high-performance GNN training. In this work, we propose a 3D heterogeneous PIM-based accelerator for GNN training referred to as HePGA. We leverage the unique characteristics of GNN layers and associated computing kernels to optimize their mapping on to different PIM devices as well as planar tiers. Our experimental analysis shows that HePGA outperforms existing PIM-based architectures by up to 3.8x and 6.8x in energy-efficiency (TOPS/W) and compute efficiency (TOPS/mm2) respectively, without sacrificing the GNN prediction accuracy. Finally, we demonstrate the applicability of HePGA to accelerate inferencing of emerging transformer models.

artificial intelligence, machine learning, natural language, (16 more...)