AITopics

Effective sub-typing (also known as grouping or clustering) of patients using their electronic health record (EHR) data can greatly inform precision medicine efforts. However, subtyping temporal EHR datasets is known to be challenging due to inherent EHR issues, including complexity and irregularity. In this study, we propose a self-supervised Mamba-based model that learns effective EHR representations and enables enhanced patient subtyping. We evaluate the proposed model on public and private real-world EHR datasets to classify the data based on the available labels and subtype patients based on the representations learned from the model. Through an extensive set of experiments, we demonstrate that our model's design choices lead to better performance compared to competitive baseline models for prediction. Moreover, we evaluate several clustering techniques to demonstrate that our findings offer valuable insights into subtyping patients based on temporal records from EHR models\footnote{Our implementations are available at https://github.com/healthylaife/triplet_mamba.

artificial intelligence, machine learning, natural language, (17 more...)

2606.28623

Country: North America > United States (0.86)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.99)
Health & Medicine > Health Care Technology > Medical Record (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Gradient boosting with vector-valued leafs

Cortes, David

Gradient boosting in the form of decision tree ensembles has successfully been applied to a variety of problems using simple objective functions based on log-likelihoods of a single variable. The concept extends naturally to objective functions operating on vectors - for example, multinomial logistic log-likelihood for multi-class classification, where observations have a score for each class - but popular frameworks approach these functions by either updating one value of the input vectors at a time, or by using a diagonal upper bound on the second derivative. This work extends the usual gradient boosting framework to functions of vector inputs and sketches a simple algorithm that can be used efficiently with histogram-based decision trees.

artificial intelligence, hessian, machine learning, (18 more...)

2606.29326

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

I-BBS: Coordinate-Free Inference of Latent Sub-Manifolds Using Random Distance Matrix Theory

Halperin, Igor

Bogomolny, Bohigas and Schmit (BBS) found that the spectrum of the pairwise distance matrix on N points sampled from a smooth d-dimensional manifold encodes a signature of the underlying geometry. We develop I-BBS (Inference-BBS), a coordinate-free method that identifies a low-dimensional latent sub-manifold embedded in a high-dimensional ambient distance matrix alone, without accessing an ambient high-dimensional vector space. It therefore applies even when that space is only partly observable or undefined. We model the ambient embedding by two classes of generative noise, model-based and model-free. The noise mixes the latent signal with off-manifold components, so the eigenvalues reorganise collectively and the latent geometry cannot be read off eigenvalue by eigenvalue. We recover it instead from two integer-stable signatures that survive the noise: the multiplicity of the top non-Perron multiplet, which fixes $d$, and a parameter-free law for how the multiplet positions shrink as the noise grows. On synthetic spheres $S^1$, $S^2$ and $S^3$ these integer signatures are far more stable under noise than the continuous spectral slope, and a blind test recovers both the manifold and the noise model from a single distance matrix. Applications to neural-network representations and to the dynamic training regime are developed in two companion papers.

artificial intelligence, machine learning, matrix, (19 more...)

2606.29675

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Bojovic, Matia, Salzo, Saverio, Pontil, Massimiliano

AdaGrad does not adapt to Hölder-smoothness for composite objectives

Adaptive gradient methods are among the standard tools for training machine learning models. Their appeal is that they reduce the need to tune a fixed learning rate by adjusting the effective stepsize using information observed along the optimization trajectory. AdaGrad, introduced by Duchi et al. [2011], is a prototypical example: it rescales the update by the square root of the cumulative sum of past squared subgradients, coordinate by coordinate. The method was originally proposed for nonsmooth Lipschitz-continuous composite convex optimization, achieving the optimal rate O(1/ n) in the objective gap. Later works considered the smooth setting and asked whether AdaGrad can adapt to the unknown smoothness level of the objective, while attaining the corresponding standard rate.

adagrad, artificial intelligence, machine learning, (18 more...)

2606.29893

Country: Europe > Italy (0.29)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Perspectives on Latent Factor Indeterminacy and its Implications for Data Representation

Peeters, Carel F. W.

The common factor analytic model is related to Helmholtz and Boltzmann machines, can be conceived as a linear autoencoder, or can be thought of as a single-hidden-layer generative neural network. We thus consider it a basal generative representation learner that can be used as a minimal model for studying the foundational characteristics of (deep) generative model architectures. We focus on the fundamental problem of indeterminacy in latent factor projections. This indeterminacy implies that, even when the intrinsic dimension of the latent vector is known, regularity conditions are met, and rotational indeterminacy is resolved, an inherent indefiniteness in the retrieval of causative latent sources remains: they will be uncertain, distributionally deviant, and non-unique. This can have major implications for data representation but remains an elusive issue, even to practitioners and theorists well-versed in the factor model. Moreover, this classic psychometric problem is intricately related to the modern issue of latent variable collapse in the variational autoencoder framework for deep generative modeling. Here, we assess this indeterminacy from various perspectives and show how these are mathematically and conceptually related and we discuss subsequent implications for the Psychometrics, Statistics, and Artificial Intelligence communities. We show that one has latent factor determinacy across all its facets when the feature-dimension grows to infinity. This feeds into an essentially distribution-free estimation approach in the sample case when the number of features grows very large. We conclude, as these are emergent properties at scale, that the factor model is suited for representation learning of very-high-dimensional data.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2606.28854

Country: North America > United States > California (0.67)

Genre: Research Report > New Finding (0.45)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Schwank, Richard, Drton, Mathias

Non-parametric recovery of causal diffusion mechanisms from steady-state observations

We consider sparse multivariate stochastic systems that evolve in continuous time according to a causal mechanism and present methodology to recover the system's time-infinitesimal transition mechanism from mere cross-sectional data. This observational paradigm is motivated by applications such as gene expression analysis, where destructive experimental techniques may only allow recording data once over a cell's lifetime. Precisely, we assume the system follows a time-homogeneous diffusion process that has reached an equilibrium distribution at observation time. Further, we assume the causal mechanism is fully described by the diffusion drift, is acyclic, and its causal structure graph is known. In this setting, we prove that the full causal mechanism, i.e., the drift function, can be non-parametrically identified under a weak non-explosion criterion. We derive a non-parametric kernel estimator for this challenging inverse problem and prove its consistency. Moreover, we propose a cross-validation scheme for hyperparameter tuning, illustrate the behavior of our estimator in simulations, and we discuss connections with irreversible generative diffusion models and low-frequency sampled data.

artificial intelligence, equation, machine learning, (18 more...)

2606.30467

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Nguyen, Nhi, Ravfogel, Shauli, Ranganath, Rajesh

What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs

Large language models (LLMs) are increasingly deployed in high-stakes domains, where free-text explanations such as chain-of-thought and post-hoc rationales are used to justify model outputs. Yet it remains unclear whether these explanations are sufficient, i.e., if they contain enough information to explain the model's output-generating process. We generalize classical sufficiency from feature attributions to arbitrary explanations and prove that explanation sufficiency can change depending on the input distribution, which must be explicitly defined for LLM explanations. We propose using the LLM itself to generate alternative inputs conditioned on an explanation, capturing its beliefs about possible inputs. We formalize self-consistent sufficiency as a goal for free-text explanations and introduce an information-theoretic metric, SCSuff, that enables evaluation of free-text explanations without relying on predefined biases or shortcuts. Our experiments show that SCSuff agrees with targeted perturbation tests where applicable and demonstrate that explanation sufficiency can vary with the input distribution. We find LLM explanations are generally insufficient and weakly correlated with model size, accuracy, or output entropy. Analysis of final-token hidden states shows that top and bottom SCSuff scores can be predicted from internal representations, suggesting that SCSuff can guide detection and improvement of sufficient LLM explanations. The code for this paper is available at https://github.com/rajesh-lab/self-consistent-sufficiency .

explanation, large language model, machine learning, (17 more...)

2606.28615

Country: Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Ghosh, Soham, Deshpande, Sameer K.

Multivariate Varying-Coefficient BART with Graphical Horseshoe Priors

Modern multivariate regression problems involve several related outcomes whose regression effects are not only nonlinear, heterogeneous, and outcome-specific, but also where the residual dependence among outcomes is scientifically meaningful. Existing multivariate Bayesian tree-based methods typically address only part of this problem: some impose substantial sharing of tree architecture across outcomes, which is overly restrictive when responses depend on distinct predictors or effect modifiers, while others accommodate residual dependence but retain simpler mean structures. This paper develops multiVCBART, a multivariate varying-coefficient Bayesian additive regression tree framework that jointly models flexible outcome-specific coefficient surfaces and a sparse residual precision matrix. Each entry of the coefficient matrix $B(x)$ is represented by an independent BART ensemble, allowing predictor effects to vary nonlinearly with modifiers $x$ across outcomes, while a Graphical Horseshoe prior on the precision matrix $Ω$ captures parsimonious residual conditional dependence. To permit efficient computation, we introduce a sampler that reduces the multivariate Gaussian likelihood to a sequence of scalar pseudo-response updates, decoupling the tree backfitting from the Graphical Horseshoe step. Theoretically, we establish the first posterior contraction rates for a multivariate BART model with jointly estimated residual dependence, proving near-minimax adaptation to underlying smoothness and structural sparsity. Empirically, multiVCBART outperforms existing multivariate tree models and Bayesian SUR competitors on sparse, high-dimensional datasets. Finally, in a re-analysis of the Genomics of Drug Sensitivity in Cancer dataset, our method identifies distinct biomarker signals and recovers a coherent residual pharmacologic network.

artificial intelligence, machine learning, noise, (17 more...)

2606.29114

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Schmocker, Philipp, Teichmann, Josef

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.

artificial intelligence, deep learning, machine learning, (17 more...)

2606.0982

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.27)

Genre:

Instructional Material (0.45)
Research Report (0.40)

Industry: Banking & Finance (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps

OuYang, RuiKang, Yu, Hanlin, Ai, Xinyue, He, Yutong, Boffi, Nicholas M., Ravikumar, Pradeep, Hernandez-Lobato, Jose Miguel, Simchowitz, Max, Miller, Benjamin Kurt, Chehab, Omar

Recent progress in flow-based generative modeling has led to models that output high-quality samples while using only a small number of function evaluations. However, at present, there is a lack of similar advances in estimating the model likelihood. In particular, most existing methods either rely on restrictive architectures that enable exact calculations, or use stochastic approximations such as Hutchinson's trace estimator that introduce substantial variance. In this work, we introduce SCAlable LikeLihood distillation of flOw maPs ( SCALLOP). SCALLOP builds on the recently proposed F2D2, a likelihood flow map model that can generate samples and their densities in a small number of function evaluations. While F2D2 uses Hutchinson's estimator during training, we introduce an alternative and more scalable likelihood distillation objective that is Hutchinson-free and admits a vectorized formulation. Empirically, we demonstrate the effectiveness of SCALLOP as a Boltzmann generator in molecular science, and further validate its benefit on image datasets. SCALLOP significantly reduces both training variance and training time while consistently improving performance compared to F2D2, and is competitive with the state-of-the-art while achieving up to 10 inference speedup over the fastest baseline.

artificial intelligence, log pt, machine learning, (18 more...)

2606.2911

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)