AITopics

doi: 10.1109/TII.2025.3534415

2502.1194

Country:

Europe > Italy (0.29)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Schade, Johannes, von Tycowicz, Christoph, Hanik, Martin

Bi-invariant Geodesic Regression with Data from the Osteoarthritis Initiative

arXiv.org Machine LearningFeb-17-2025

Many phenomena are naturally characterized by measuring continuous transformations such as shape changes in medicine or articulated systems in robotics. Modeling the variability in such datasets requires performing statistics on Lie groups, that is, manifolds carrying an additional group structure. As the Lie group captures the symmetries in the data, it is essential from a theoretical and practical perspective to ask for statistical methods that respect these symmetries; this way they are insensitive to confounding effects, e.g., due to the choice of reference coordinate systems. In this work, we investigate geodesic regression -- a generalization of linear regression originally derived for Riemannian manifolds. While Lie groups can be endowed with Riemannian metrics, these are generally incompatible with the group structure. We develop a non-metric estimator using an affine connection setting. It captures geodesic relationships respecting the symmetries given by left and right translations. For its computation, we propose an efficient fixed point algorithm requiring simple differential expressions that can be calculated through automatic differentiation. We perform experiments on a synthetic example and evaluate our method on an open-access, clinical dataset studying knee joint configurations under the progression of osteoarthritis.

artificial intelligence, machine learning, regression, (18 more...)

2502.11826

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (0.61)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.61)
Health & Medicine > Therapeutic Area > Immunology (0.61)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Sanford, Luke C, Ayers, Megan, Gordon, Matthew, Stone, Eliana

Adversarial Debiasing for Unbiased Parameter Recovery

arXiv.org Machine LearningFeb-17-2025

Advances in machine learning and the increasing availability of high-dimensional data have led to the proliferation of social science research that uses the predictions of machine learning models as proxies for measures of human activity or environmental outcomes. However, prediction errors from machine learning models can lead to bias in the estimates of regression coefficients. In this paper, we show how this bias can arise, propose a test for detecting bias, and demonstrate the use of an adversarial machine learning algorithm in order to de-bias predictions. These methods are applicable to any setting where machine-learned predictions are the dependent variable in a regression. We conduct simulations and empirical exercises using ground truth and satellite data on forest cover in Africa. Using the predictions from a naive machine learning model leads to biased parameter estimates, while the predictions from the adversarial model recover the true coefficients.

adversarial debiasing, artificial intelligence, machine learning, (17 more...)

2502.12323

Country:

Africa (0.34)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Lyu, Jingyang, Zhou, Kangjie, Zhong, Yiqiao

A statistical theory of overfitting for imbalanced classification

arXiv.org Machine LearningFeb-16-2025

Classification with imbalanced data is a common challenge in data analysis, where certain classes (minority classes) account for a small fraction of the training data compared with other classes (majority classes). Classical statistical theory based on large-sample asymptotics and finite-sample corrections is often ineffective for high-dimensional data, leaving many overfitting phenomena in empirical machine learning unexplained. In this paper, we develop a statistical theory for high-dimensional imbalanced classification by investigating support vector machines and logistic regression. We find that dimensionality induces truncation or skewing effects on the logit distribution, which we characterize via a variational problem under high-dimensional asymptotics. In particular, for linearly separable data generated from a two-component Gaussian mixture model, the logits from each class follow a normal distribution $\mathsf{N}(0,1)$ on the testing set, but asymptotically follow a rectified normal distribution $\max\{\kappa, \mathsf{N}(0,1)\}$ on the training set -- which is a pervasive phenomenon we verified on tabular data, image data, and text data. This phenomenon explains why the minority class is more severely affected by overfitting. Further, we show that margin rebalancing, which incorporates class sizes into the loss function, is crucial for mitigating the accuracy drop for the minority class. Our theory also provides insights into the effects of overfitting on calibration and other uncertain quantification measures.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2502.11323

Country:

North America > Canada > Ontario (0.27)
North America > United States > Wisconsin (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
(2 more...)

Toward Equitable Access: Leveraging Crowdsourced Reviews to Investigate Public Perceptions of Health Resource Accessibility

Xue, Zhaoqian, Liu, Guanhong, Wei, Kai, Zhang, Chong, Zeng, Qingcheng, Hu, Songhua, Hua, Wenyue, Fan, Lizhou, Zhang, Yongfeng, Li, Lingyao

Access to health resources is a critical determinant of public well-being and societal resilience, particularly during public health crises when demand for medical services and preventive care surges. However, disparities in accessibility persist across demographic and geographic groups, raising concerns about equity. Traditional survey methods often fall short due to limitations in coverage, cost, and timeliness. This study leverages crowdsourced data from Google Maps reviews, applying advanced natural language processing techniques, specifically ModernBERT, to extract insights on public perceptions of health resource accessibility in the United States during the COVID-19 pandemic. Additionally, we employ Partial Least Squares regression to examine the relationship between accessibility perceptions and key socioeconomic and demographic factors including political affiliation, racial composition, and educational attainment. Our findings reveal that public perceptions of health resource accessibility varied significantly across the U.S., with disparities peaking during the pandemic and slightly easing post-crisis. Political affiliation, racial demographics, and education levels emerged as key factors shaping these perceptions. These findings underscore the need for targeted interventions and policy measures to address inequities, fostering a more inclusive healthcare infrastructure that can better withstand future public health challenges.

artificial intelligence, machine learning, natural language, (18 more...)

2502.10641

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Public Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Shokouhinejad, Hossein, Razavi-Far, Roozbeh, Mohammadian, Hesamodin, Rabbani, Mahdi, Ansong, Samuel, Higgins, Griffin, Ghorbani, Ali A

Recent Advances in Malware Detection: Graph Learning and Explainability

The rapid evolution of malware has necessitated the development of sophisticated detection methods that go beyond traditional signature-based approaches. Graph learning techniques have emerged as powerful tools for modeling and analyzing the complex relationships inherent in malware behavior, leveraging advancements in Graph Neural Networks (GNNs) and related methods. This survey provides a comprehensive exploration of recent advances in malware detection, focusing on the interplay between graph learning and explainability. It begins by reviewing malware analysis techniques and datasets, emphasizing their foundational role in understanding malware behavior and supporting detection strategies. The survey then discusses feature engineering, graph reduction, and graph embedding methods, highlighting their significance in transforming raw data into actionable insights, while ensuring scalability and efficiency. Furthermore, this survey focuses on explainability techniques and their applications in malware detection, ensuring transparency and trustworthiness. By integrating these components, this survey demonstrates how graph learning and explainability contribute to building robust, interpretable, and scalable malware detection systems. Future research directions are outlined to address existing challenges and unlock new opportunities in this critical area of cybersecurity.

artificial intelligence, expert system, machine learning, (18 more...)

2502.10556

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

High-Throughput Computational Screening and Interpretable Machine Learning of Metal-organic Frameworks for Iodine Capture

Tan, Haoyi, Teng, Yukun, Shan, Guangcun

The removal of leaked radioactive iodine isotopes in humid environments holds significant importance in nuclear waste management and nuclear accident mitigation. In this study, high - throughput computational screening and machine learning were combined to reveal the iodine capture performance of 1816 metal - organic framework (MOF) materials under humid air conditions. First ly, the relationship between the structural characteristics of MOF materials (including density, surface area and pore features) and their adsorption properties was explored, with the aim of identifying the optimal structural parameters for iodine capture. Subsequently, two machine learning regression algorithms - Random Forest and CatBoos t, were employed to predict the iodine adsorption capabilities of MOF materials. In addition to 6 structural features, 25 molecular features (encompassing the types of metal and ligand atoms as well as bonding modes) and 8 chemical features (including heat of adsorption and Henry's coefficient) were incorporated to enhance the predicti on accuracy of the machine learning algorithms . Feature importance was assessed to determine the relative influence of various features on iodine adsorption performance, in which the Henry's coefficient and heat of adsorption to iodine were found the two most crucial chemical factors. Furthermore, four types of molecular fingerprint s were introduced for provid ing comprehensive and detailed structural information of MOF materials. The top 20 most significant MACCS molecul ar fingerprints were picked out, revealing that the presence of six - membered ring structures and nitrogen atoms in the MOF framework were the key structural factors that enhance d iodine adsorption, followed by the existence of oxygen atoms. This work combine d high - throughput computation, machine learning, and molecular fingerprints to comprehensively and systematically elucidate the multifaceted factors influencing the iodine adsorption performance of MOFs in humid environments, offering prof ound insight ful guidelines for screening and structural design of advanced MOF materials.

adsorption, descriptor, mof material, (12 more...)

2502.15764

Country:

North America > United States > Idaho (0.04)
Asia > China > Hong Kong (0.04)
Europe > Austria > Vienna (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Water & Waste Management (1.00)
Health & Medicine (1.00)
Energy > Renewable (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Carroll, Mark L., Wooten, Margaret R., Simpson, Claire E., Spradlin, Caleb S., Frost, Melanie J., Blanco-Rojas, Mariana, Williams, Zachary W., Caraballo-Vega, Jordan A., Neigh, Christopher S. R.

Mapping bathymetry of inland water bodies on the North Slope of Alaska with Landsat using Random Forest

The North Slope of Alaska is dominated by small waterbodies that provide critical ecosystem services for local population and wildlife. Detailed information on the depth of the waterbodies is scarce due to the challenges with collecting such information. In this work we have trained a machine learning (Random Forest Regressor) model to predict depth from multispectral Landsat data in waterbodies across the North Slope of Alaska. The greatest challenge is the scarcity of in situ data, which is expensive and difficult to obtain, to train the model. We overcame this challenge by using modeled depth predictions from a prior study as synthetic training data to provide a more diverse training data pool for the Random Forest. The final Random Forest model was more robust than models trained directly on the in situ data and when applied to 208 Landsat 8 scenes from 2016 to 2018 yielded a map with an overall $r^{2}$ value of 0.76 on validation. The final map has been made available through the Oak Ridge National Laboratory Distribute Active Archive Center (ORNL-DAAC). This map represents a first of its kind regional assessment of waterbody depth with per pixel estimates of depth for the entire North Slope of Alaska.

artificial intelligence, machine learning, north slope, (18 more...)

2502.10214

Country: North America > United States > Alaska (1.00)

Genre: Research Report (1.00)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

arXiv.org Machine LearningFeb-14-2025

Variational empirical Bayes variable selection in high-dimensional logistic regression

Tang, Yiqi, Martin, Ryan

Logistic regression involving high-dimensional covariates is a practically important problem. Often the goal is variable selection, i.e., determining which few of the many covariates are associated with the binary response. Unfortunately, the usual Bayesian computations can be quite challenging and expensive. Here we start with a recently proposed empirical Bayes solution, with strong theoretical convergence properties, and develop a novel and computationally efficient variational approximation thereof. One such novelty is that we develop this approximation directly for the marginal distribution on the model space, rather than on the regression coefficients themselves. We demonstrate the method's strong performance in simulations, and prove that our variational approximation inherits the strong selection consistency property satisfied by the posterior distribution that it is approximating.

approximation, artificial intelligence, machine learning, (18 more...)

2502.10532

Genre:

Research Report > New Finding (0.72)
Research Report > Experimental Study (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Ghadiri, Mehrdad, Fahrbach, Matthew, Kook, Yunbum, Jadbabaie, Ali

Fast Tensor Completion via Approximate Richardson Iteration

arXiv.org Artificial IntelligenceFeb-13-2025

We study tensor completion (TC) through the lens of low-rank tensor decomposition (TD). Many TD algorithms use fast alternating minimization methods, which solve highly structured linear regression problems at each step (e.g., for CP, Tucker, and tensor-train decompositions). However, such algebraic structure is lost in TC regression problems, making direct extensions unclear. To address this, we propose a lifting approach that approximately solves TC regression problems using structured TD regression algorithms as blackbox subroutines, enabling sublinear-time methods. We theoretically analyze the convergence rate of our approximate Richardson iteration based algorithm, and we demonstrate on real-world tensors that its running time can be 100x faster than direct methods for CP completion.

approximate richardson iteration, artificial intelligence, machine learning, (3 more...)

2502.09534

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)