penalty
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Montserrat (0.04)
- (3 more...)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Italy (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- (12 more...)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
A Flexible Empirical Bayes Approach to Generalized Linear Models, with Applications to Sparse Logistic Regression
Xie, Dongyue, Zhu, Wanrong, Stephens, Matthew
We introduce a flexible empirical Bayes approach for fitting Bayesian generalized linear models. Specifically, we adopt a novel mean-field variational inference (VI) method and the prior is estimated within the VI algorithm, making the method tuning-free. Unlike traditional VI methods that optimize the posterior density function, our approach directly optimizes the posterior mean and prior parameters. This formulation reduces the number of parameters to optimize and enables the use of scalable algorithms such as L-BFGS and stochastic gradient descent. Furthermore, our method automatically determines the optimal posterior based on the prior and likelihood, distinguishing it from existing VI methods that often assume a Gaussian variational. Our approach represents a unified framework applicable to a wide range of exponential family distributions, removing the need to develop unique VI methods for each combination of likelihood and prior distributions. We apply the framework to solve sparse logistic regression and demonstrate the superior predictive performance of our method in extensive numerical studies, by comparing it to prevalent sparse logistic regression approaches.
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
A Generalized Adaptive Joint Learning Framework for High-Dimensional Time-Varying Models
In modern biomedical and econometric studies, longitudinal processes are often characterized by complex time-varying associations and abrupt regime shifts that are shared across correlated outcomes. Standard functional data analysis (FDA) methods, which prioritize smoothness, often fail to capture these dynamic structural features, particularly in high-dimensional settings. This article introduces Adaptive Joint Learning (AJL), a hierarchical regularization framework designed to integrate functional variable selection with structural changepoint detection in multivariate time-varying coefficient models. Unlike standard simultaneous estimation approaches, we propose a theoretically grounded two-stage screening-and-refinement procedure. This framework first synergizes adaptive group-wise penalization with sure screening principles to robustly identify active predictors, followed by a refined fused regularization step that effectively borrows strength across multiple outcomes to detect local regime shifts. We provide a rigorous theoretical analysis of the estimator in the ultra-high-dimensional regime (p >> n). Crucially, we establish the sure screening consistency of the first stage, which serves as the foundation for proving that the refined estimator achieves the oracle property-performing as well as if the true active set and changepoint locations were known a priori. A key theoretical contribution is the explicit handling of approximation bias via undersmoothing conditions to ensure valid asymptotic inference. The proposed method is validated through comprehensive simulations and an application to Sleep-EDF data, revealing novel dynamic patterns in physiological states.
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.45)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.66)
- Health & Medicine > Therapeutic Area > Sleep (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Multi-environment Invariance Learning with Missing Data
Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging the inherent heterogeneity across environments to develop methods that provide causal explanations while enhancing robust prediction. However, in many practical scenarios, obtaining complete outcome data from each environment is challenging due to the high cost or complexity of data collection. This limitation in available data hinders the development of models that fully leverage environmental heterogeneity, making it crucial to address missing outcomes to improve both causal insights and robust prediction. In this work, we derive an estimator from the invariance objective under missing outcomes. We establish non-asymptotic guarantees on variable selection property and $\ell_2$ error convergence rates, which are influenced by the proportion of missing data and the quality of imputation models across environments. We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset to predict the count of bike rentals. The results show that despite relying on a biased imputation model, the estimator is efficient and achieves lower prediction error, provided the bias is within a reasonable range.
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Jordan (0.04)
Deriving Decoder-Free Sparse Autoencoders from First Principles
Gradient descent on log-sum-exp (LSE) objectives performs implicit expectation--maximization (EM): the gradient with respect to each component output equals its responsibility. The same theory predicts collapse without volume control analogous to the log-determinant in Gaussian mixture models. We instantiate the theory in a single-layer encoder with an LSE objective and InfoMax regularization for volume control. Experiments confirm the theory's predictions. The gradient--responsibility identity holds exactly; LSE alone collapses; variance prevents dead components; decorrelation prevents redundancy. The model exhibits EM-like optimization dynamics in which lower loss does not correspond to better features and adaptive optimizers offer no advantage. The resulting decoder-free model learns interpretable mixture components, confirming that implicit EM theory can prescribe architectures.
What Functions Does XGBoost Learn?
Ki, Dohyeong, Guntuboyina, Adityanand
This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey > Bergen County > Hackensack (0.04)
- (4 more...)
Sparse Convex Biclustering
Jiang, Jiakun, Xiang, Dewei, Gu, Chenliang, Liu, Wei, Wang, Binhuan
Biclustering is an essential unsupervised machine learning technique for simultaneously clustering rows and columns of a data matrix, with widespread applications in genomics, transcriptomics, and other high-dimensional omics data. Despite its importance, existing biclustering methods struggle to meet the demands of modern large-scale datasets. The challenges stem from the accumulation of noise in high-dimensional features, the limitations of non-convex optimization formulations, and the computational complexity of identifying meaningful biclusters. These issues often result in reduced accuracy and stability as the size of the dataset increases. To overcome these challenges, we propose Sparse Convex Biclustering (SpaCoBi), a novel method that penalizes noise during the biclustering process to improve both accuracy and robustness. By adopting a convex optimization framework and introducing a stability-based tuning criterion, SpaCoBi achieves an optimal balance between cluster fidelity and sparsity. Comprehensive numerical studies, including simulations and an application to mouse olfactory bulb data, demonstrate that SpaCoBi significantly outperforms state-of-the-art methods in accuracy. These results highlight SpaCoBi as a robust and efficient solution for biclustering in high-dimensional and large-scale datasets.
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > China > Guangdong Province > Zhuhai (0.04)