transfer function
Tessellation Localized Transfer learning for nonparametric regression
Halconruy, Hélène, Bobbia, Benjamin, Lejamtel, Paul
Transfer learning aims to improve performance on a target task by leveraging information from related source tasks. We propose a nonparametric regression transfer learning framework that explicitly models heterogeneity in the source-target relationship. Our approach relies on a local transfer assumption: the covariate space is partitioned into finitely many cells such that, within each cell, the target regression function can be expressed as a low-complexity transformation of the source regression function. This localized structure enables effective transfer where similarity is present while limiting negative transfer elsewhere. We introduce estimators that jointly learn the local transfer functions and the target regression, together with fully data-driven procedures that adapt to unknown partition structure and transfer strength. We establish sharp minimax rates for target regression estimation, showing that local transfer can mitigate the curse of dimensionality by exploiting reduced functional complexity. Our theoretical guarantees take the form of oracle inequalities that decompose excess risk into estimation and approximation terms, ensuring robustness to model misspecification. Numerical experiments illustrate the benefits of the proposed approach.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
Contrast transfer functions help quantify neural network out-of-distribution generalization in HRTEM
DaCosta, Luis Rangel, Scott, Mary C.
Neural networks, while effective for tackling many challengi ng scientific tasks, are not known to perform well out-of-distribution (OOD), i.e., within domains which d iffer from their training data. Understanding neural network OOD generalization is paramount to their suc cessful deployment in experimental workflows, especially when ground-truth knowledge about the experime nt is hard to establish or experimental conditions significantly vary. With inherent access to ground-truth in formation and fine-grained control of underlying distributions, simulation-based data curation facilitate s precise investigation of OOD generalization behavior. Here, we probe generalization with respect to imaging condi tions of neural network segmentation models for high-resolution transmission electron microscopy (HRTEM) imaging of nanoparticles, training and measuring the OOD generalization of over 12,000 neural networks using synthetic data generated via random structure sampling and multislice simulation. Using the HRTEM contra st transfer function, we further develop a framework to compare information content of HRTEM datasets an d quantify OOD domain shifts. We demonstrate that neural network segmentation models enjoy significant performance stability, but will smoothly and predictably worsen as imaging conditions shift from the training distribution. Lastly, we consider limitations of our approach in explaining other OOD shifts, s uch as of the atomic structures, and discuss complementary techniques for understanding generalizatio n in such settings.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > New York (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (4 more...)
- Energy (0.46)
- Government (0.46)
Addressing A Posteriori Performance Degradation in Neural Network Subgrid Stress Models
Neural network subgrid stress models often have a priori performance that is far better than the a posteriori performance, leading to neural network models that look very promising a priori completely failing in a posteriori Large Eddy Simulations (LES). This performance gap can be decreased by combining two different methods, training data augmentation and reducing input complexity to the neural network. Augmenting the training data with two different filters before training the neural networks has no performance degradation a priori as compared to a neural network trained with one filter. A posteriori, neural networks trained with two different filters are far more robust across two different LES codes with different numerical schemes. In addition, by ablating away the higher order terms input into the neural network, the a priori versus a posteriori performance changes become less apparent. When combined, neural networks that use both training data augmentation and a less complex set of inputs have a posteriori performance far more reflective of their a priori evaluation.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Extracting Compact Recurrences From Convolutions
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads - naively requiring a full pass (or caching of activations) over the input sequence for each generated token - similarly to attention-based models. In this paper, we seek to enable O (1) compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10 higher throughput than Transformers and 1 .5 higher than Hyena at 1 .3
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Extracting Compact Recurrences From Convolutions
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads - naively requiring a full pass (or caching of activations) over the input sequence for each generated token - similarly to attention-based models. In this paper, we seek to enable O (1) compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10 higher throughput than Transformers and 1 .5 higher than Hyena at 1 .3
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
On Robustness of Consensus over Pseudo-Undirected Path Graphs
Sinha, Abhinav, Mukherjee, Dwaipayan, Kumar, Shashi Ranjan
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
- Asia > India > Maharashtra > Mumbai (0.04)
- North America > United States > Ohio > Hamilton County > Cincinnati (0.04)
Advancing rail safety: An onboard measurement system of rolling stock wheel flange wear based on dynamic machine learning algorithms
Nkundineza, Celestin, Njaji, James Ndodana, Abubeker, Samrawit, Gatera, Omar, Hanyurwimfura, Damien
Rail and wheel interaction functionality is pivotal to the railway system safety, requiring accurate measurement systems for optimal safety monitoring operation. This paper introduces an innovative onboard measurement system for monitoring wheel flange wear depth, utilizing displacement and temperature sensors. Laboratory experiments are conducted to emulate wheel flange wear depth and surrounding temperature fluctuations in different periods of time. Employing collected data, the training of machine learning algorithms that are based on regression models, is dynamically automated. Further experimentation results, using standards procedures, validate the system's efficacy. To enhance accuracy, an infinite impulse response filter (IIR) that mitigates vehicle dynamics and sensor noise is designed. Filter parameters were computed based on specifications derived from a Fast Fourier Transform analysis of locomotive simulations and emulation experiments data. The results show that the dynamic machine learning algorithm effectively counter sensor nonlinear response to temperature effects, achieving an accuracy of 96.5 %, with a minimal runtime. The real-time noise reduction via IIR filter enhances the accuracy up to 98.2 %. Integrated with railway communication embedded systems such as Internet of Things devices, this advanced monitoring system offers unparalleled real-time insights into wheel flange wear and track irregular conditions that cause it, ensuring heightened safety and efficiency in railway systems operations.
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)
- Europe > Switzerland (0.04)
- Africa > Middle East > Djibouti (0.04)
- (2 more...)