boxplot
- Asia > China > Hong Kong (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- (13 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.48)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.67)
Personalizing black-box models for nonparametric regression with minimax optimality
Recent advances in large-scale models, including deep neural networks and large language models, have substantially improved performance across a wide range of learning tasks. The widespread availability of such pre-trained models creates new opportunities for data-efficient statistical learning, provided they can be effectively integrated into downstream tasks. Motivated by this setting, we study few-shot personalization, where a pre-trained black-box model is adapted to a target domain using a limited number of samples. We develop a theoretical framework for few-shot personalization in nonparametric regression and propose algorithms that can incorporate a black-box pre-trained model into the regression procedure. We establish the minimax optimal rate for the personalization problem and show that the proposed method attains this rate. Our results clarify the statistical benefits of leveraging pre-trained models under sample scarcity and provide robustness guarantees when the pre-trained model is not informative. We illustrate the finite-sample performance of the methods through simulations and an application to the California housing dataset with several pre-trained models.
- North America > United States > California (0.25)
- Europe > United Kingdom (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Appendix Table of Contents
The naive aggregation of these public datasets results in a database with partial and incomplete labels, e.g., LiTS only had labels for the liver and its tumors, and KiTS only had labels for the kidneys and its tumors. Conversely, our AbdomenAtlas 1.0 is fully-annotated, offering detailed per-voxel labels Figure 3: Anatomical boundaries and structures can be indistinct due to disease, as seen in the JHH dataset. We display CT volumes with patients depicted under unhealthy conditions that are challenging for most AI algorithms to identify. The CT volumes are from patients in unhealthy conditions. The encoder performs down-sampling operations, and it is designed to capture high-level semantics and context information.
- Asia > China > Hong Kong (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- (13 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.48)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.67)
Multidimensional Distributional Neural Network Output Demonstrated in Super-Resolution of Surface Wind Speed
Goldwyn, Harrison J., Krock, Mitchell, Rudi, Johann, Getter, Daniel, Bessac, Julie
Accurate quantification of uncertainty in neural network predictions remains a central challenge for scientific applications involving high-dimensional, correlated data. While existing methods capture either aleatoric or epistemic uncertainty, few offer closed-form, multidimensional distributions that preserve spatial correlation while remaining computationally tractable. In this work, we present a framework for training neural networks with a multidimensional Gaussian loss, generating closed-form predictive distributions over outputs with non-identically distributed and heteroscedastic structure. Our approach captures aleatoric uncertainty by iteratively estimating the means and covariance matrices, and is demonstrated on a super-resolution example. We leverage a Fourier representation of the covariance matrix to stabilize network training and preserve spatial correlation. We introduce a novel regularization strategy -- referred to as information sharing -- that interpolates between image-specific and global covariance estimates, enabling convergence of the super-resolution downscaling network trained on image-specific distributional loss functions. This framework allows for efficient sampling, explicit correlation modeling, and extensions to more complex distribution families all without disrupting prediction performance. We demonstrate the method on a surface wind speed downscaling task and discuss its broader applicability to uncertainty-aware prediction in scientific models.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California (0.14)
- Asia > Turkmenistan > Ahal Region > Anau (0.04)
- (4 more...)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Health & Medicine (0.69)
XAI-Driven Spectral Analysis of Cough Sounds for Respiratory Disease Characterization
Amado-Caballero, Patricia, San-José-Revuelta, Luis Miguel, Aguilar-García, María Dolores, Garmendia-Leiza, José Ramón, Alberola-López, Carlos, Casaseca-de-la-Higuera, Pablo
This paper proposes an eXplainable Artificial Intelligence (XAI)-driven methodology to enhance the understanding of cough sound analysis for respiratory disease management. We employ occlusion maps to highlight relevant spectral regions in cough spectrograms processed by a Convolutional Neural Network (CNN). Subsequently, spectral analysis of spectrograms weighted by these occlusion maps reveals significant differences between disease groups, particularly in patients with COPD, where cough patterns appear more variable in the identified spectral regions of interest. This contrasts with the lack of significant differences observed when analyzing raw spectrograms. The proposed approach extracts and analyzes several spectral features, demonstrating the potential of XAI techniques to uncover disease-specific acoustic signatures and improve the diagnostic capabilities of cough sound analysis by providing more interpretable results.
- Europe > United Kingdom > Scotland (0.04)
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
An empirical comparison of some outlier detection methods with longitudinal data
This note investigates the problem of detecting outliers in longitudinal data. It compares well-known methods used in official statistics with proposals from the fields of data mining and machine learning that are based on the distance between observations or binary partitioning trees. This is achieved by applying the methods to panel survey data related to different types of statistical units. Traditional methods are quite simple, enabling the direct identification of potential outliers, but they require specific assumptions. In contrast, recent methods provide only a score whose magnitude is directly related to the likelihood of an outlier being present. All the methods require the user to set a number of tuning parameters. However, the most recent methods are more flexible and sometimes more effective than traditional methods. In addition, these methods can be applied to multidimensional data.
Missing value imputation with adversarial random forests -- MissARF
Golchian, Pegah, Kapar, Jan, Watson, David S., Wright, Marvin N.
Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests (MissARF), based on generative machine learning, that provides both single and multiple imputation. MissARF employs adversarial random forest (ARF) for density estimation and data synthesis. To impute a missing value of an observation, we condition on the non-missing values and sample from the estimated conditional distribution generated by ARF. Our experiments demonstrate that MissARF performs comparably to state-of-the-art single and multiple imputation methods in terms of imputation quality and fast runtime with no additional costs for multiple imputation.
- North America > United States > Wyoming > Albany County > Laramie (0.14)
- Europe > Germany > Bremen > Bremen (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores
Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex model such as a random forest, such importances are not innate -- as they are, e.g., with linear regression. Efficient methods have been created to provide such capabilities, with one of the most popular among them being permutation feature importance due to its efficiency, model-agnostic nature, and perceived intuitiveness. However, permutation feature importance has been shown to be misleading in the presence of dependent features as a result of the creation of unrealistic observations when permuting the dependent features. In this work, we develop TRIP (Test for Reliable Interpretation via Permutation), a test requiring minimal assumptions that is able to detect unreliable permutation feature importance scores that are the result of model extrapolation. To build on this, we demonstrate how the test can be complemented in order to allow its use in high dimensional settings. Through testing on simulated data and applications, our results show that the test can be used to reliably detect when permutation feature importance scores are unreliable.
- North America > United States (0.14)
- Europe > Italy (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)