pes
- North America > United States (0.28)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
Understanding and Improving Early Stopping for Learning with Noisy Labels
The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stopping trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stopping point by considering a DNN as a whole. However, a DNN can be considered as a composition of a series of layers, and we find that the latter layers in a DNN are much more sensitive to label noise, while their former counterparts are quite robust. Therefore, selecting a stopping point for the whole network may make different DNN layers antagonistically affect each other, thus degrading the final performance.
- North America > United States (0.28)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
On the Limitations and Capabilities of Position Embeddings for Length Generalization
Chen, Yang, Liang, Yitao, Lin, Zhouchen
Abstract--In Transformers, Position Embeddings (PEs) significantly influence Length Generalization (LG) performance, yet their fundamental role remains unclear . In this work, we investigate the limitations and capabilities of PEs in achieving LG. We theoretically analyze PEs in Position-Only Linear Attentions (POLAs), introducing Linear Representation Complexity (LRC) to characterize when PEs enable LG. Our analysis shows that PEs do not expand computational capabilities but structure learned computations across positions. Extending to practical Transformers, we propose Sequential Representation Complexity (SRC) and conjecture that LG is possible if and only if SRC remains invariant across scales. We support this hypothesis with empirical evidence in various reasoning tasks. T o enhance LG, we introduce Scale Hint, allowing flexible instance scaling, and a Learning-Based Position Embedding framework that automatically learns positional relations. Our work provides theoretical insights and practical strategies for improving LG in Transformers. Length Generalization (LG) refers to the ability of a model to extrapolate from small-scale instances to larger ones in reasoning [1]-[4]. In many tasks, the sample space grows exponentially with the problem scale, making exhaustive training infeasible. Thus, it is important to learn from limited training samples at small scales while generalizing to larger ones.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Predictive Entropy Search for Efficient Global Optimization of Black-box Functions
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > British Columbia (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
- Transportation > Air (0.41)
Predictive Entropy Search for Efficient Global Optimization of Black-box Functions
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
The Role of Logic and Automata in Understanding Transformers
Lin, Anthony W., Barcelo, Pablo
The advent of transformers has in recent years led to powerful and revolutionary Large Language Models (LLMs). Despite this, our understanding on the capability of transformers is still meager. In this invited contribution, we recount the rapid progress in the last few years to the question of what transformers can do. In particular, we will see the integral role of logic and automata (also with some help from circuit complexity) in answering this question. We also mention several open problems at the intersection of logic, automata, verification and transformers.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- South America > Chile (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
Anisotropic Fourier Features for Positional Encoding in Medical Imaging
Jabareen, Nabil, Yuan, Dongsheng, Liu, Dingming, Ten, Foo-Wei, Lukassen, Sören
The adoption of Transformer-based architectures in the medical domain is growing rapidly. In medical imaging, the analysis of complex shapes - such as organs, tissues, or other anatomical structures - combined with the often anisotropic nature of high-dimensional images complicates these adaptations. In this study, we critically examine the role of Positional Encodings (PEs), arguing that commonly used approaches may be suboptimal for the specific challenges of medical imaging. Sinusoidal Positional Encodings (SPEs) have proven effective in vision tasks, but they struggle to preserve Euclidean distances in higher-dimensional spaces. Isotropic Fourier Feature Positional Encodings (IFPEs) have been proposed to better preserve Euclidean distances, but they lack the ability to account for anisotropy in images. To address these limitations, we propose Anisotropic Fourier Feature Positional Encoding (AFPE), a generalization of IFPE that incorporates anisotropic, class-specific, and domain-specific spatial dependencies. We systematically benchmark AFPE against commonly used PEs on multi-label classification in chest X-rays, organ classification in CT images, and ejection fraction regression in echocardiography. Our results demonstrate that choosing the correct PE can significantly improve model performance. We show that the optimal PE depends on the shape of the structure of interest and the anisotropy of the data. Finally, our proposed AFPE significantly outperforms state-of-the-art PEs in all tested anisotropic settings. We conclude that, in anisotropic medical images and videos, it is of paramount importance to choose an anisotropic PE that fits the data and the shape of interest.
- Europe > Germany > Berlin (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Africa > Ghana (0.04)
HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator
Ogbogu, Chukwufumnanya, Narang, Gaurav, Joardar, Biresh Kumar, Doppa, Janardhan Rao, Chakrabarty, Krishnendu, Pande, Partha Pratim
Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and SRAM exist, with each device offering unique trade-offs in terms of power, latency, area, and non-idealities. A heterogeneous manycore architecture enabled by 3D integration can combine multiple PIM devices on a single platform, to enable energy-efficient and high-performance GNN training. In this work, we propose a 3D heterogeneous PIM-based accelerator for GNN training referred to as HePGA. We leverage the unique characteristics of GNN layers and associated computing kernels to optimize their mapping on to different PIM devices as well as planar tiers. Our experimental analysis shows that HePGA outperforms existing PIM-based architectures by up to 3.8x and 6.8x in energy-efficiency (TOPS/W) and compute efficiency (TOPS/mm2) respectively, without sacrificing the GNN prediction accuracy. Finally, we demonstrate the applicability of HePGA to accelerate inferencing of emerging transformer models.
- Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.04)
- North America > United States > Washington > Whitman County > Pullman (0.04)
- North America > United States > Virginia (0.04)
- (5 more...)