AITopics

2604.12536

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.88)
Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Consumer Health (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)

Popular ScienceFeb-1-2026, 13:01:00 GMT

A Harry Potter quiz may predict your career prospects

Researchers solemnly swear they are up to no good. Scorpius Malfoy (Nyx Calder) gets sorted into Slytherin in a production of'Harry Potter and the Cursed Child' at Princess Theatre on February 25, 2021, in Melbourne, Australia. Breakthroughs, discoveries, and DIY tips sent six days a week. Many people dream of starting their own business but wonder if they have what it takes. According to new research, you can find the answer to that dilemma in a Harry Potter house quiz.

artificial intelligence, harry potter quiz, obschonka, (10 more...)

Popular Science

Country: Oceania > Australia > Victoria > Melbourne (0.25)

Genre: Research Report > New Finding (0.69)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.97)

Technology: Information Technology > Artificial Intelligence (0.50)

Castiglione, Cristian, Segers, Alexandre, Clement, Lieven, Risso, Davide

Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data

arXiv.org Machine LearningDec-29-2024

Single-cell RNA sequencing allows the quantitation of gene expression at the individual cell level, enabling the study of cellular heterogeneity and gene expression dynamics. Dimensionality reduction is a common preprocessing step to simplify the visualization, clustering, and phenotypic characterization of samples. This step, often performed using principal component analysis or closely related methods, is challenging because of the size and complexity of the data. In this work, we present a generalized matrix factorization model assuming a general exponential dispersion family distribution and we show that many of the proposed approaches in the single-cell dimensionality reduction literature can be seen as special cases of this model. Furthermore, we propose a scalable adaptive stochastic gradient descent algorithm that allows us to estimate the model efficiently, enabling the analysis of millions of cells. Our contribution extends to introducing a novel warm start initialization method, designed to accelerate algorithm convergence and increase the precision of final estimates. Moreover, we discuss strategies for dealing with missing values and model selection. We benchmark the proposed algorithm through extensive numerical experiments against state-of-the-art methods and showcase its use in real-world biological applications. The proposed method systematically outperforms existing methods of both generalized and non-negative matrix factorization, demonstrating faster execution times while maintaining, or even enhancing, matrix reconstruction fidelity and accuracy in biological signal extraction. Finally, all the methods discussed here are implemented in an efficient open-source R package, sgdGMF, available at github/CristianCastiglione/sgdGMF

artificial intelligence, machine learning, matrix factorization, (17 more...)

2412.20509

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.86)
Government > Regional Government (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Greenwell, Brandon M., Dahlmann, Annika, Dhoble, Saurabh

Explainable Boosting Machines with Sparsity -- Maintaining Explainability in High-Dimensional Settings

arXiv.org Machine LearningNov-13-2023

Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.

artificial intelligence, coefficient, machine learning, (17 more...)

2311.07452

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (0.46)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)
Health & Medicine > Therapeutic Area > Neurology > Amyotrophic Lateral Sclerosis (ALS) (0.46)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Holvoet, Freek, Antonio, Katrien, Henckaerts, Roel

Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

arXiv.org Artificial IntelligenceOct-30-2023

Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.

autoencoder, insurance pricing, neural network, (14 more...)

2310.12671

Country:

Oceania > Australia (0.04)
Europe > France (0.04)
Europe > Norway (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry: Banking & Finance > Insurance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-12-2023

Extensions of Heterogeneity in Integration and Prediction (HIP) with R Shiny Application

Butts, J., Wendt, C., Bowler, R., Hersh, C. P., Long, Q., Eberly, L., Safo, S. E.

Multiple data views measured on the same set of participants is becoming more common and has the potential to deepen our understanding of many complex diseases by analyzing these different views simultaneously. Equally important, many of these complex diseases show evidence of subgroup heterogeneity (e.g., by sex or race). HIP (Heterogeneity in Integration and Prediction) is among the first methods proposed to integrate multiple data views while also accounting for subgroup heterogeneity to identify common and subgroup-specific markers of a particular disease. However, HIP is applicable to continuous outcomes and requires programming expertise by the user. Here we propose extensions to HIP that accommodate multi-class, Poisson, and Zero-Inflated Poisson outcomes while retaining the benefits of HIP. Additionally, we introduce an R Shiny application, accessible on shinyapps.io at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/, that provides an interface with the Python implementation of HIP to allow more researchers to use the method anywhere and on any device. We applied HIP to identify genes and proteins common and specific to males and females that are associated with exacerbation frequency. Although some of the identified genes and proteins show evidence of a relationship with chronic obstructive pulmonary disease (COPD) in existing literature, others may be candidates for future research investigating their relationship with COPD. We demonstrate the use of the Shiny application with a publicly available data. An R-package for HIP would be made available at https://github.com/lasandrall/HIP.

artificial intelligence, machine learning, subgroup heterogeneity, (14 more...)

2310.08426

Country:

North America > United States > Minnesota (0.04)
North America > United States > Pennsylvania (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Data Science (0.67)

Fissler, Tobias, Lorentzen, Christian, Mayer, Michael

Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial Practice

arXiv.org Artificial IntelligenceJul-26-2023

One of the main tasks of actuaries and data scientists is to build good predictive models for certain phenomena such as the claim size or the number of claims in insurance. These models ideally exploit given feature information to enhance the accuracy of prediction. This user guide revisits and clarifies statistical techniques to assess the calibration or adequacy of a model on the one hand, and to compare and rank different models on the other hand. In doing so, it emphasises the importance of specifying the prediction target functional at hand a priori (e.g. the mean or a quantile) and of choosing the scoring function in model comparison in line with this target functional. Guidance for the practical choice of the scoring function is provided. Striving to bridge the gap between science and daily practice in application, it focuses mainly on the pedagogical presentation of existing results and of best practice. The results are accompanied and illustrated by two real data case studies on workers' compensation and customer churn.

calibration, data mining, machine learning, (21 more...)

doi: 10.48550/arXiv.2202.12780

2202.1278

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
(12 more...)

Genre: Research Report > Experimental Study (0.68)

Industry: Banking & Finance > Insurance (0.48)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Benam, Ali Shateri, Furno, Angelo, Faouzi, Nour-Eddin El

Exploring the Multi-modal Demand Dynamics During Transport System Disruptions

arXiv.org Artificial IntelligenceJul-3-2023

Passengers respond heterogeneously to such disruptive events based on numerous factors. This study takes a data-driven approach to explore multi-modal demand dynamics under disruptions. We first develop a methodology to automatically detect anomalous instances through historical hourly travel demand data. Then we apply clustering to these anomalous hours to distinguish various forms of multi-modal demand dynamics occurring during disruptions. Our study provides a straightforward tool for categorising various passenger responses to disruptive events in terms of mode choice and paves the way for predictive analyses on estimating the scope of modal shift under distinct disruption scenarios.

artificial intelligence, disruption, machine learning, (14 more...)

2307.00877

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Berkshire > Reading (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.49)

Industry:

Transportation > Infrastructure & Services (0.97)
Transportation > Ground (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Bonetti, Paolo, Metelli, Alberto Maria, Restelli, Marcello

Nonlinear Feature Aggregation: Two Algorithms driven by Theory

arXiv.org Artificial IntelligenceJun-19-2023

Many real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues, as well as the risk of overfitting. Ideally, only relevant and non-redundant features should be considered to preserve the complete information of the original data and limit the dimensionality. Dimensionality reduction and feature selection are common preprocessing techniques addressing the challenge of efficiently dealing with high-dimensional data. Dimensionality reduction methods control the number of features in the dataset while preserving its structure and minimizing information loss. Feature selection aims to identify the most relevant features for a task, discarding the less informative ones. Previous works have proposed approaches that aggregate features depending on their correlation without discarding any of them and preserving their interpretability through aggregation with the mean. A limitation of methods based on correlation is the assumption of linearity in the relationship between features and target. In this paper, we relax such an assumption in two ways. First, we propose a bias-variance analysis for general models with additive Gaussian noise, leading to a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function. Then, we extend the approach assuming that a generalized linear model regulates the relationship between features and target. A deviance analysis leads to a second dimensionality reduction algorithm (GenLinCFA), applicable to a larger class of regression problems and classification settings. Finally, we test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.

artificial intelligence, cov, machine learning, (16 more...)

2306.11143

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (0.92)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Tiwari, Akash, Bukkapatnam, Satish

Unsupervised spectral-band feature identification for optimal process discrimination

arXiv.org Artificial IntelligenceDec-7-2022

Changes in real-world dynamic processes are often described in terms of differences in energies $\textbf{E}(\underline{\alpha})$ of a set of spectral-bands $\underline{\alpha}$. Given continuous spectra of two classes $A$ and $B$, or in general, two stochastic processes $S^{(A)}(f)$ and $S^{(B)}(f)$, $f \in \mathbb{R}^+$, we address the ubiquitous problem of identifying a subset of intervals of $f$ called spectral-bands $\underline{\alpha} \subset \mathbb{R}^+$ such that the energies $\textbf{E}(\underline{\alpha})$ of these bands can optimally discriminate between the two classes. We introduce EGO-MDA, an unsupervised method to identify optimal spectral-bands $\underline{\alpha}^*$ for given samples of spectra from two classes. EGO-MDA employs a statistical approach that iteratively minimizes an adjusted multinomial log-likelihood (deviance) criterion $\mathcal{D}(\underline{\alpha},\mathcal{M})$. Here, Mixture Discriminant Analysis (MDA) aims to derive MLE of two GMM distribution parameters, i.e., $\mathcal{M}^* = \underset{\mathcal{M}}{\rm argmin}~\mathcal{D}(\underline{\alpha}, \mathcal{M})$ and identify a classifier that optimally discriminates between two classes for a given spectral representation. The Efficient Global Optimization (EGO) finds the spectral-bands $\underline{\alpha}^* = \underset{\underline{\alpha}}{\rm argmin}~\mathcal{D}(\underline{\alpha},\mathcal{M})$ for given GMM parameters $\mathcal{M}$. For pathological cases of low separation between mixtures and model misspecification, we discuss the effect of the sample size and the number of iterations on the estimates of parameters $\mathcal{M}$ and therefore the classifier performance. A case study on a synthetic data set is provided. In an engineering application of optimal spectral-banding for anomaly tracking, EGO-MDA achieved at least 70% improvement in the median deviance relative to other methods tested.

artificial intelligence, ego-mda, machine learning, (18 more...)

2212.038

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)