dilution
Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration
Federated Learning (FL) has emerged as a promising paradigm for collaborative machine learning, while preserving user data privacy. Despite its potential, standard FL algorithms lack support for diverse heterogeneous device prototypes, which vary significantly in model and dataset sizes---from small IoT devices to large workstations. This limitation is only partially addressed by existing knowledge distillation (KD) techniques, which often fail to transfer knowledge effectively across a broad spectrum of device prototypes with varied capabilities. This failure primarily stems from two issues: the dilution of informative logits from more capable devices by those from less capable ones, and the use of a single integrated logits as the distillation target across all devices, which neglects their individual learning capacities and and the unique contributions of each device. To address these challenges, we introduce TAKFL, a novel KD-based framework that treats the knowledge transfer from each device prototype's ensemble as a separate task, independently distilling each to preserve its unique contributions and avoid dilution. TAKFL also incorporates a KD-based self-regularization technique to mitigate the issues related to the noisy and unsupervised ensemble distillation process. To integrate the separately distilled knowledge, we introduce an adaptive task arithmetic knowledge integration process, allowing each student model to customize the knowledge integration for optimal performance.
Active Learning and Explainable AI for Multi-Objective Optimization of Spin Coated Polymers
Young, Brendan, Alvey, Brendan, Werbrouck, Andreas, Murphy, Will, Keller, James, Young, Matthias J., Maschmann, Matthew
Spin coating polymer thin films to achieve specific mechanical properties is inherently a multi-objective optimization problem. We present a framework that integrates an active Pareto front learning algorithm (PyePAL) with visualization and explainable AI techniques to optimize processing parameters. PyePAL uses Gaussian process models to predict objective values (hardness and elasticity) from the design variables (spin speed, dilution, and polymer mixture), guiding the adaptive selection of samples toward promising regions of the design space. To enable interpretable insights into the high-dimensional design space, we utilize UMAP (Uniform Manifold Approximation and Projection) for two-dimensional visualization of the Pareto front exploration. Additionally, we incorporate fuzzy linguistic summaries, which translate the learned relationships between process parameters and performance objectives into linguistic statements, thus enhancing the explainability and understanding of the optimization results. Experimental results demonstrate that our method efficiently identifies promising polymer designs, while the visual and linguistic explanations facilitate expert-driven analysis and knowledge discovery.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation
Asher, Gabriel, Shah, Devesh, Caudy, Amy A., Ferro, Luke, Amar, Lea, Costa, Ana S. H., Patton, Thomas, O'Connor, Niall, Campbell, Jennifer M., Geremia, Jack
A vast majority of mass spectrometry data remains uncharacterized, leaving much of its biological and chemical information untapped. Recent advances in machine learning have begun to address this gap, particularly for tasks such as spectral identification in tandem mass spectrometry data. Here, we present the latest generation of LSM-MS2, a large-scale deep learning foundation model trained on millions of spectra to learn a semantic chemical space. LSM-MS2 achieves state-of-the-art performance in spectral identification, improving on existing methods by 30% in accuracy of identifying challenging isomeric compounds, yielding 42% more correct identifications in complex biological samples, and maintaining robustness under low-concentration conditions. Furthermore, LSM-MS2 produces rich spectral embeddings that enable direct biological interpretation from minimal downstream data, successfully differentiating disease states and predicting clinical outcomes across diverse translational applications.
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials > Chemicals > Commodity Chemicals (0.46)
Linking heterogeneous microstructure informatics with expert characterization knowledge through customized and hybrid vision-language representations for industrial qualification
Safdar, Mutahar, Wood, Gentry, Zimmermann, Max, Lamouche, Guy, Wanjara, Priti, Zhao, Yaoyao Fiona
Rapid and reliable qualification of advanced materials remains a bottleneck in industrial manufacturing, particularly for heterogeneous structures produced via non-conventional additive manufacturing processes. This study introduces a novel framework that links microstructure informatics with a range of expert characterization knowledge using customized and hybrid vision-language representations (VLRs). By integrating deep semantic segmentation with pre-trained multi-modal models (CLIP and FLAVA), we encode both visual microstructural data and textual expert assessments into shared representations. To overcome limitations in general-purpose embeddings, we develop a customized similarity-based representation that incorporates both positive and negative references from expert-annotated images and their associated textual descriptions. This allows zero-shot classification of previously unseen microstructures through a net similarity scoring approach. Validation on an additively manufactured metal matrix composite dataset demonstrates the framework's ability to distinguish between acceptable and defective samples across a range of characterization criteria. Comparative analysis reveals that FLAVA model offers higher visual sensitivity, while the CLIP model provides consistent alignment with the textual criteria. Z-score normalization adjusts raw unimodal and cross-modal similarity scores based on their local dataset-driven distributions, enabling more effective alignment and classification in the hybrid vision-language framework. The proposed method enhances traceability and interpretability in qualification pipelines by enabling human-in-the-loop decision-making without task-specific model retraining. By advancing semantic interoperability between raw data and expert knowledge, this work contributes toward scalable and domain-adaptable qualification strategies in engineering informatics.
- North America > Canada > Quebec > Montreal (0.28)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Europe > Germany (0.04)
- Materials (0.46)
- Machinery > Industrial Machinery (0.34)
Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration
Federated Learning (FL) has emerged as a promising paradigm for collaborative machine learning, while preserving user data privacy. Despite its potential, standard FL algorithms lack support for diverse heterogeneous device prototypes, which vary significantly in model and dataset sizes---from small IoT devices to large workstations. This limitation is only partially addressed by existing knowledge distillation (KD) techniques, which often fail to transfer knowledge effectively across a broad spectrum of device prototypes with varied capabilities. This failure primarily stems from two issues: the dilution of informative logits from more capable devices by those from less capable ones, and the use of a single integrated logits as the distillation target across all devices, which neglects their individual learning capacities and and the unique contributions of each device. To address these challenges, we introduce TAKFL, a novel KD-based framework that treats the knowledge transfer from each device prototype's ensemble as a separate task, independently distilling each to preserve its unique contributions and avoid dilution.
Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction
Nguyen, Hy, Thudumu, Srikanth, Du, Hung, Vasa, Rajesh, Mouzakis, Kon
Next-frame prediction in videos is crucial for applications such as autonomous driving, object tracking, and motion prediction. The primary challenge in next-frame prediction lies in effectively capturing and processing both spatial and temporal information from previous video sequences. The transformer architecture, known for its prowess in handling sequence data, has made remarkable progress in this domain. However, transformer-based next-frame prediction models face notable issues: (a) The multi-head self-attention (MHSA) mechanism requires the input embedding to be split into $N$ chunks, where $N$ is the number of heads. Each segment captures only a fraction of the original embeddings information, which distorts the representation of the embedding in the latent space, resulting in a semantic dilution problem; (b) These models predict the embeddings of the next frames rather than the frames themselves, but the loss function based on the errors of the reconstructed frames, not the predicted embeddings -- this creates a discrepancy between the training objective and the model output. We propose a Semantic Concentration Multi-Head Self-Attention (SCMHSA) architecture, which effectively mitigates semantic dilution in transformer-based next-frame prediction. Additionally, we introduce a loss function that optimizes SCMHSA in the latent space, aligning the training objective more closely with the model output. Our method demonstrates superior performance compared to the original transformer-based predictors.
Enhancing Uncertainty Estimation in Semantic Segmentation via Monte-Carlo Frequency Dropout
Zeevi, Tal, Staib, Lawrence H., Onofrey, John A.
In convolutional neural network (CNN) layers, commonly used in segmentation tasks, each convolution step Estimating prediction uncertainties in deterministic deep corresponds to a node on the network graph, essentially turning learning models often involves the strategic introduction of Dropout into a random source of impulse noise within controlled artificial noise into the data [1]. This can occur the CNN feature maps. This method, however, may not either before [2, 3] or during [4, 5, 6] neural network processing, comprehensively capture the predictive distribution in medical with subsequent measurement of variations in model imaging, where noise extends into the frequency domain performance to assess robustness. Techniques such as Drop- - a range poorly addressed by impulse noise. Our recent Connect [7] and Dropout [8], which randomly omit network findings [10] suggest that Frequency Dropout [11], which edges or nodes during processing, have been foundational in randomly removes frequency components from feature maps this respect, effectively injecting random patterns of noise during Monte Carlo (MC) simulations, refines predictive uncertainty into the network's operation allowing the simulation of a estimates in medical imaging classification over predictive distribution approximating Bayesian inference [9].
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
The Eclipsing Binaries via Artificial Intelligence. II. Need for Speed in PHOEBE Forward Models
Submitted to ApJS ABSTRACT In modern astronomy, the quantity of data collected has vastly exceeded the capacity for manual analysis, necessitating the use of advanced artificial intelligence (AI) techniques to assist scientists with the most labor-intensive tasks. AI can optimize simulation codes where computational bottlenecks arise from the time required to generate forward models. One such example is PHOEBE, a modeling code for eclipsing binaries (EBs), where simulating individual systems is feasible, but analyzing observables for extensive parameter combinations is highly time-consuming. To address this, we present a fully connected feedforward artificial neural network (ANN) trained on a dataset of over one million synthetic light curves generated with PHOEBE. Optimization of the ANN architecture yielded a model with six hidden layers, each containing 512 nodes, provides an optimized balance between accuracy and computational complexity. Extensive testing enabled us to establish ANN's applicability limits and to quantify the systematic and statistical errors associated with using such networks for EB analysis. Our findings demonstrate the critical role of dilution effects in parameter estimation for EBs, and we outline methods to incorporate these effects in AI-based models. This proposed ANN framework enables a speedup of over four orders of magnitude compared to traditional methods, with systematic errors not exceeding 1%, and often as low as 0.01%, across the entire parameter space. INTRODUCTION number of EBs are found in triple and multiple systems (Conroy et al. 2014; Orosz 2015), hosting circumbinary Fundamental stellar properties are inferred predominantly planets (Welsh et al. 2015), and featuring mass from the study of eclipsing binary stars (EBs) transfer and apsidal motion (Hambleton et al. 2013); (Torres et al. 2010). Their favorable orbital alignment these broaden the domains of study while retaining the with the line of sight, and consequent eclipses, make same tractable modeling principles. In particular, we them ideal astrophysical laboratories: a simple geometry can probe stellar interiors by studying tidally induced coupled with well-understood dynamical laws allow oscillations and gravity-mode pulsations in detached binaries us to obtain fundamental parameters without a-priori (Huber 2015); ubiquitous contact binaries are still assumptions (Prša 2018). Many of the phenomena being observed in hot that, we need samplers such as Markov Chain Monte Jupiters have their foundations in EB studies, e.g., the Carlo (MCMC, Foreman-Mackey et al. 2017) to provide Rossiter-McLaughlin effect, tidal distortions of the host heuristic parameter posteriors. This entails hundreds of star, irradiation effects, Roche lobe overflow and wind thousands if not millions of forward-model runs, which outflows, gravity darkening, apsidal motion, third body puts a hard limit on the number of systems we can solve dynamics, etc. (Barclay et al. 2012).
- North America > United States > California > San Mateo County > Redwood City (0.04)
- Europe > Switzerland (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
Parallel Learning by Multitasking Neural Networks
Agliari, Elena, Alessandrelli, Andrea, Barra, Adriano, Ricci-Tersenghi, Federico
A modern challenge of Artificial Intelligence is learning multiple patterns at once (i.e.parallel learning). While this can not be accomplished by standard Hebbian associative neural networks, in this paper we show how the Multitasking Hebbian Network (a variation on theme of the Hopfield model working on sparse data-sets) is naturally able to perform this complex task. We focus on systems processing in parallel a finite (up to logarithmic growth in the size of the network) amount of patterns, mirroring the low-storage level of standard associative neural networks at work with pattern recognition. For mild dilution in the patterns, the network handles them hierarchically, distributing the amplitudes of their signals as power-laws w.r.t. their information content (hierarchical regime), while, for strong dilution, all the signals pertaining to all the patterns are raised with the same strength (parallel regime). Further, confined to the low-storage setting (i.e., far from the spin glass limit), the presence of a teacher neither alters the multitasking performances nor changes the thresholds for learning: the latter are the same whatever the training protocol is supervised or unsupervised. Results obtained through statistical mechanics, signal-to-noise technique and Monte Carlo simulations are overall in perfect agreement and carry interesting insights on multiple learning at once: for instance, whenever the cost-function of the model is minimized in parallel on several patterns (in its description via Statistical Mechanics), the same happens to the standard sum-squared error Loss function (typically used in Machine Learning).
On the Ramifications of Human Label Uncertainty
Zhou, Chen, Prabhushankar, Mohit, AlRegib, Ghassan
In this work, we study the ramifications of human label uncertainty (HLU). Our evaluation of existing uncertainty estimation algorithms, with the presence of HLU, indicates the limitations of existing uncertainty metrics and algorithms themselves in response to HLU. Meanwhile, we observe undue effects in predictive uncertainty and generalizability. To mitigate the undue effects, we introduce a novel natural scene statistics (NSS) based label dilution training scheme without requiring massive human labels. Specifically, we first select a subset of samples with low perceptual quality ranked by statistical regularities of images. We then assign separate labels to each sample in this subset to obtain a training set with diluted labels. Our experiments and analysis demonstrate that training with NSS-based label dilution alleviates the undue effects caused by HLU.
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)