Regression
Ancestral Inference and Learning for Branching Processes in Random Environments
Jiang, Xiaoran, Vidyashankar, Anand N.
Ancestral inference for branching processes in random environments involves determining the ancestor distribution parameters using the population sizes of descendant generations. In this paper, we introduce a new methodology for ancestral inference utilizing the generalized method of moments. We demonstrate that the estimator's behavior is critically influenced by the coefficient of variation of the environment sequence. Furthermore, despite the process's evolution being heavily dependent on the offspring means of various generations, we show that the joint limiting distribution of the ancestor and offspring estimators of the mean, under appropriate centering and scaling, decouple and converge to independent Gaussian random variables when the ratio of the number of generations to the logarithm of the number of replicates converges to zero. Additionally, we provide estimators for the limiting variance and illustrate our findings through numerical experiments and data from Polymerase Chain Reaction experiments and COVID-19 data.
Multivariate Feature Selection and Autoencoder Embeddings of Ovarian Cancer Clinical and Genetic Data
Bote-Curiel, Luis, Ruiz-Llorente, Sergio, Muñoz-Romero, Sergio, Yagüe-Fernández, Mónica, Barquín, Arantzazu, García-Donas, Jesús, Rojo-Álvarez, José Luis
This study explores a data-driven approach to discovering novel clinical and genetic markers in ovarian cancer (OC). Two main analyses were performed: (1) a nonlinear examination of an OC dataset using autoencoders, which compress data into a 3-dimensional latent space to detect potential intrinsic separability between platinum-sensitive and platinum-resistant groups; and (2) an adaptation of the informative variable identifier (IVI) to determine which features (clinical or genetic) are most relevant to disease progression. In the autoencoder analysis, a clearer pattern emerged when using clinical features and the combination of clinical and genetic data, indicating that disease progression groups can be distinguished more effectively after supervised fine tuning. For genetic data alone, this separability was less apparent but became more pronounced with a supervised approach. Using the IVI-based feature selection, key clinical variables (such as type of surgery and neoadjuvant chemotherapy) and certain gene mutations showed strong relevance, along with low-risk genetic factors. These findings highlight the strength of combining machine learning tools (autoencoders) with feature selection methods (IVI) to gain insights into ovarian cancer progression. They also underscore the potential for identifying new biomarkers that integrate clinical and genomic indicators, ultimately contributing to improved patient stratification and personalized treatment strategies.
Reviews: Attribution-Based Confidence Metric For Deep Neural Networks
Overall Comments This paper is reasonably well motivated and provide justifications for the key use of integrated gradients as part of the computing the confidence score. The paper also presents several empirical demonstrations of the algorithm. The key motivation is that one might want to compute calibration scores without retraining like is typical for isotonic regression and platt scaling. Originality I am not aware of work using integrated gradients for computing calibration scores. However, the literature on interpretability and uncertainty representation is vast.
Reviews: The Impact of Regularization on High-dimensional Logistic Regression
Originality: This paper develops asymptotics theory for high-dimensional regularized logistic regression (LR). The main result of the paper (Theorem 1) is proved for any locally-Lipschitz function \Psi which then in special cases provides asymptotics for common descriptive statistics like correlation, variance, mean-squared error. Special case results for L1 and L2 regularized LR are also derived and quantities highlighted in 1 above are derived. The paper also demonstrates that the numerical simulation results align with the theoretical relations. Quality: The paper contains high quality results and proofs, the notation and setup is well defined in section 2 before the main results.
Reviews: The Impact of Regularization on High-dimensional Logistic Regression
The authors study the limiting distribution of certain functionals of the penalized maximum likelihood estimator in regression. The paper contains nontrivial new extensions of the work of Sur and Candes in the unpenalized case, and is well-written and interesting. The reviews were mostly positive and the paper is in good shape.
Reviews: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models
This paper gives a simple and elegant algorithm for solving the long-studied problem of graphical model estimation (at least, in the case of pairwise MRFs, which includes the classic Ising model). The method uses a form of constrained logistic regression, which in retrospect, feels like the "right" way to solve this problem. The algorithm simply runs this constrained logistic regression method to learn the outgoing edges attached to each node. The proof is elegant and modular: first, based on standard generalization bounds, a sufficient number of samples allows minimization of the logistic loss function. Second, this loss is related to another loss function (the sigmoid of the inner product of the parameter vector with a sample from the distribution).
Review for NeurIPS paper: Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition
Additional Feedback: This paper analyzes "double descent" phenomenon, which is when the generalization error of a model peaks at the interpolation threshold (as a function either of model complexity or of sample size). The authors develop a fine-grained bias-variance decomposition which decomposes the risk into the bias and several different variance terms. They apply this decomposition to the random features regression model and show which of these terms lead to divergence. This paper addresses an important issue that has lately been focus of much research. It suggests "fine-grained" bias-variance decomposition that allows to clarify several subtle effects.
One Model to Forecast Them All and in Entity Distributions Bind Them
Bölat, Kutay, Tindemans, Simon
Probabilistic forecasting in power systems often involves multi-entity datasets like households, feeders, and wind turbines, where generating reliable entity-specific forecasts presents significant challenges. Traditional approaches require training individual models for each entity, making them inefficient and hard to scale. This study addresses this problem using GUIDE-VAE, a conditional variational autoencoder that allows entity-specific probabilistic forecasting using a single model. GUIDE-VAE provides flexible outputs, ranging from interpretable point estimates to full probability distributions, thanks to its advanced covariance composition structure. These distributions capture uncertainty and temporal dependencies, offering richer insights than traditional methods. To evaluate our GUIDE-VAE-based forecaster, we use household electricity consumption data as a case study due to its multi-entity and highly stochastic nature. Experimental results demonstrate that GUIDE-VAE outperforms conventional quantile regression techniques across key metrics while ensuring scalability and versatility. These features make GUIDE-VAE a powerful and generalizable tool for probabilistic forecasting tasks, with potential applications beyond household electricity consumption.
Review for NeurIPS paper: Truncated Linear Regression in High Dimensions
Weaknesses: - My major concern is why the problem is difficult. Assumption 1 literally enforces that the adversary cannot pick arbitrary S, but only those such that a constant alpha-fraction of the observations are hidden/removed. Thus, suppose before removal we have a total of m samples (a, y). After removal it reduces to alpha * m pairs (a, y), which still suffices for accurate recovery provided that m O(k log n). - It is not convincing to me that the sample complexity in Theorem 3.1 is near-optimal. I know that O(k log n) is near-optimal, but does your result really imply such bound?