AITopics

2603.28987

Country:

North America > United States > Virginia (0.04)
North America > Canada (0.04)
Europe > Germany (0.04)
(2 more...)

Genre: Research Report (0.69)

Industry: Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)

Seraghiti, Giovanni, Dubrulle, Kévin, Vandaele, Arnaud, Gillis, Nicolas

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

arXiv.org Machine LearningApr-1-2026

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

algorithm, artificial intelligence, machine learning, (18 more...)

2603.29715

Country:

Europe > United Kingdom (0.04)
Europe > Belgium (0.04)
North America > United States > Utah (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Quintero, Manuel, Shreekumar, Advik, Stephenson, William T., Broderick, Tamara

Do covariates explain why these groups differ? The choice of reference group can reverse conclusions in the Oaxaca-Blinder decomposition

arXiv.org Machine LearningApr-1-2026

Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has not been a systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can yield substantively different conclusions and that these differences are not entirely driven by model misspecification or small data. We prove that substantively different conclusions occur in up to half of the parameter space, but find these discrepancies rare in the real-data analyses we study. We explain this empirical rarity by examining how realistic data-generating processes can be biased towards parameters that do not change conclusions under the OBD.

artificial intelligence, covariate, decomposition, (17 more...)

2603.29972

Country:

North America > Mexico > Oaxaca (0.26)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Michigan (0.04)
(7 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence (0.94)
Information Technology > Data Science (0.88)

The Atlantic - TechnologyMar-31-2026, 16:27:00 GMT

If You Need a Laptop, Buy It Now

Electronics are getting more expensive and worse. Recently, a Costco in Florida instituted a new store policy. An employee told me that he was asked to open up every desktop computer displayed in the electronics section and remove the memory chips. Otherwise, the RAM harvesters would get them. Elsewhere, criminal groups are misdirecting trucks carrying RAM in order to loot them.

artificial intelligence, manufacturer, popular latest newsletter, (10 more...)

The Atlantic - Technology

Country:

North America > United States > New York (0.05)
North America > United States > Missouri (0.05)
North America > United States > California (0.05)
(2 more...)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.50)

Barman, Pronob Kumar, Reynolds, Tera L., Foulds, James

Enhancing Online Support Group Formation Using Topic Modeling Techniques

arXiv.org Machine LearningMar-31-2026

Online health communities (OHCs) are vital for fostering peer support and improving health outcomes. Support groups within these platforms can provide more personalized and cohesive peer support, yet traditional support group formation methods face challenges related to scalability, static categorization, and insufficient personalization. To overcome these limitations, we propose two novel machine learning models for automated support group formation: the Group specific Dirichlet Multinomial Regression (gDMR) and the Group specific Structured Topic Model (gSTM). These models integrate user generated textual content, demographic profiles, and interaction data represented through node embeddings derived from user networks to systematically automate personalized, semantically coherent support group formation. We evaluate the models on a large scale dataset from MedHelp, comprising over 2 million user posts. Both models substantially outperform baseline methods including LDA, DMR, and STM in predictive accuracy (held out log likelihood), semantic coherence (UMass metric), and internal group consistency. The gDMR model yields group covariates that facilitate practical implementation by leveraging relational patterns from network structures and demographic data. In contrast, gSTM emphasizes sparsity constraints to generate more distinct and thematically specific groups. Qualitative analysis further validates the alignment between model generated groups and manually coded themes, showing the practical relevance of the models in informing groups that address diverse health concerns such as chronic illness management, diagnostic uncertainty, and mental health. By reducing reliance on manual curation, these frameworks provide scalable solutions that enhance peer interactions within OHCs, with implications for patient engagement, community resilience, and health outcomes.

machine learning, manuscriptsubmittedtoacm, natural language, (17 more...)

2603.24765

Country:

Europe > Lithuania (0.05)
Oceania > Kiribati (0.04)
Oceania > Australia (0.04)
(47 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.94)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Machine LearningMar-31-2026

Persistence diagrams of random matrices via Morse theory: universality and a new spectral diagnostic

Loftus, Matthew

We prove that the persistence diagram of the sublevel set filtration of the quadratic form f(x) = x^T M x restricted to the unit sphere S^{n-1} is analytically determined by the eigenvalues of the symmetric matrix M. By Morse theory, the diagram has exactly n-1 finite bars, with the k-th bar living in homological dimension k-1 and having length equal to the k-th eigenvalue spacing s_k = λ_{k+1} - λ_k. This identification transfers random matrix theory (RMT) universality to persistence diagram universality: for matrices drawn from the Gaussian Orthogonal Ensemble (GOE), we derive the closed-form persistence entropy PE = log(8n/π) - 1, and verify numerically that the coefficient of variation of persistence statistics decays as n^{-0.6}. Different random matrix ensembles (GOE, GUE, Wishart) produce distinct universal persistence diagrams, providing topological fingerprints of RMT universality classes. As a practical consequence, we show that persistence entropy outperforms the standard level spacing ratio \langle r \rangle for discriminating GOE from GUE matrices (AUC 0.978 vs. 0.952 at n = 100, non-overlapping bootstrap 95% CIs), and detects global spectral perturbations in the Rosenzweig-Porter model to which \langle r \rangle is blind. These results establish persistence entropy as a new spectral diagnostic that captures complementary information to existing RMT tools.

artificial intelligence, machine learning, persistence diagram, (16 more...)

2603.27903

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Rumi, Alberto, Jacobsen, Andrew, Cesa-Bianchi, Nicolò, Vitale, Fabio

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $\boldsymbol{u}_1,\ldots,\boldsymbol{u}_T$ in $\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to optimally adapt to the number of switches $S_T = \sum_t\mathbb{I}\{\boldsymbol{u}_t \neq \boldsymbol{u}_{t-1}\}$ of an arbitrary comparator sequence. In particular, we provide the first algorithm for linear bandits achieving the optimal regret guarantee of order $\mathcal{O}\big(\sqrt{d(1+S_T) T}\big)$ up to poly-logarithmic terms without prior knowledge of $S_T$, thus resolving a long-standing open problem.

artificial intelligence, data mining, machine learning, (16 more...)

2603.25916

Country:

North America > United States (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.55)

Hirari, Mehdi, Centofanti, Fabio, Hubert, Mia, Van Aelst, Stefan

Robust Tensor-on-Tensor Regression

Tensor-on-tensor (TOT) regression is an important tool for the analysis of tensor data, aiming to predict a set of response tensors from a corresponding set of predictor tensors. However, standard TOT regression is sensitive to outliers, which may be present in both the response and the predictor. It can be affected by casewise outliers, which are observations that deviate from the bulk of the data, as well as by cellwise outliers, which are individual anomalous cells within the tensors. The latter are particularly common due to the typically large number of cells in tensor data. This paper introduces a novel robust TOT regression method, named ROTOT, that can handle both types of outliers simultaneously, and can cope with missing values as well. This method uses a single loss function to reduce the influence of both casewise and cellwise outliers in the response. The outliers in the predictor are handled using a robust Multilinear Principal Component Analysis method. Graphical diagnostic tools are also proposed to identify the different types of outliers detected. The performance of ROTOT is evaluated through extensive simulations and further illustrated using the Labeled Faces in the Wild dataset, where ROTOT is applied to predict facial attributes.

artificial intelligence, data mining, machine learning, (20 more...)

2603.25911

Country:

Europe > Belgium (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Modeling & Simulation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Shi, Haoming, Chi, Eric C., Luo, Hengrui

Asymptotic Optimism for Tensor Regression Models with Applications to Neural Network Compression

We study rank selection for low-rank tensor regression under random covariates design. Under a Gaussian random-design model and some mild conditions, we derive population expressions for the expected training-testing discrepancy (optimism) for both CP and Tucker decomposition. We further demonstrate that the optimism is minimized at the true tensor rank for both CP and Tucker regression. This yields a prediction-oriented rank-selection rule that aligns with cross-validation and extends naturally to tensor-model averaging. We also discuss conditions under which under- or over-ranked models may appear preferable, thereby clarifying the scope of the method. Finally, we showcase its practical utility on a real-world image regression task and extend its application to tensor-based compression of neural network, highlighting its potential for model selection in deep learning.

artificial intelligence, machine learning, optr, (20 more...)

2603.26048

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Identifiable Deep Latent Variable Models for MNAR Data

Xie, Huiming, Xue, Fei, Wang, Xiao

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is independent of the missing values themselves. This assumption is frequently violated in real-world scenarios, prompted by recent advances in imputation methods using deep learning to address this challenge. However, these methods neglect the crucial issue of nonparametric identifiability in missing-not-at-random (MNAR) data, which can lead to biased and unreliable results. This paper seeks to bridge this gap by proposing a novel framework based on deep latent variable models for MNAR data. Building on the assumption of conditional no self-censoring given latent variables, we establish the identifiability of the data distribution. This crucial theoretical result guarantees the feasibility of our approach. To effectively estimate unknown parameters, we develop an efficient algorithm utilizing importance-weighted autoencoders. We demonstrate, both theoretically and empirically, that our estimation process accurately recovers the ground-truth joint distribution under specific regularity conditions. Extensive simulation studies and real-world data experiments showcase the advantages of our proposed method compared to various classical and state-of-the-art approaches to missing data imputation.

artificial intelligence, machine learning, missingness mechanism, (16 more...)

2603.24771

Country:

Africa > Botswana (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
South America > Paraguay > Asunción > Asunción (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)