Regression
Review for NeurIPS paper: On the Optimal Weighted \ell_2 Regularization in Overparameterized Linear Regression
Weaknesses: The main issue I have with the paper is about the novelty of the results. The authors mention that previous work on linear regression is not as general as current work. In particular, they either only allow isotropic features or signal. This paper which is arXived about a month before the NeurIPS deadline seems to do both: [1] Emami, Melikasadat, et al. "Generalization error of generalized linear models in high dimensions." The results of this paper allow to characterize the exact generalization error in the same asymptotic limit for Guassian data with general covariance and any regularization, which includes the \ell_2 type regularzations considered here, as well as more general regularizations like general \ell_p norms. Here are my understanding of the differences of the results of the two papers: - In [1] the authors allow for a Gaussian feature with any covariance matrix, whereas your paper allow non-Gaussina features so long as they have bounded 12th centered-moment.
Reviews: Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression
The paper tackles the problem of predicting the outcome of an action chosen from a set of possible actions, The outcome is a function of the action, having a linear component, non-linear component and some additive noise. The idea is first finding a linear function minimizing the deviation from the outcomes, for every distribution which is "close" to the empirical distribution (by the Wasserstein distance). Idea which was analyzed before. The idea added in the paper is using the resulting linear-regression coefficient to build a metric upon samples from the same group and then produce prediction which is the average of the outcomes for the K-nearest neighbors. This way the prediction can leverage not only the private history of the specific instance but also the outcomes of "close" instances.
Reviews: Bayesian Batch Active Learning as Sparse Subset Approximation
This manuscript proposes a novel method for Bayesian batch active learning through sparse subset approximation and a convenient set of reductions to arrive at a tractable algorithm. This method is validated and explored through a series of special cases (linear regression and classification), illustrations, and experiments. Overall the method appears to be competitive with the state of the art. Overall this manuscript is well written, insightful, and enjoyable to read. The proposed approach outlined on page 3 is elegant, appears to work well in practice, and the approach may be useful in other settings.
Reviews: List-decodable Linear Regression
Post-rebuttal response: I read the authors' response and don't have any further comments. This paper considers the model of "robust statistics" where an alpha fraction of the training data comes from the ground truth distribution while the rest are corrupted arbitrarily (i.e. Traditionally, research has been on the setting where alpha is large, so that the parameters of the true distribution are information-theoretically identifiable. However, recent focus has been on the small alpha setting. Here, the parameters of the true distribution cannot be uniquely identified even with infinitely many samples.
Reviews: List-decodable Linear Regression
This paper studies the challenging problem of doing linear regression in the setting where an overwhelming fraction (1-alpha) of the examples are adversarially corrupted. It extends recent work on using the Sum-of-Squares hierarchy for robust estimation. The main contribution is realizing that anti-concentration (and being able to certify anti-concentration) is the key. The algorithm has a high running time (d (1/alpha 8)) but given the challenging nature of the problem, the reviewers felt that the fact that the problem can be solved in polynomial time for any fixed alpha 0 is surprising and an important contribution.
Foundations of a Knee Joint Digital Twin from qMRI Biomarkers for Osteoarthritis and Knee Replacement
Hoyer, Gabrielle, Gao, Kenneth T, Gassert, Felix G, Luitjens, Johanna, Jiang, Fei, Majumdar, Sharmila, Pedoia, Valentina
This study forms the basis of a digital twin system of the knee joint, using advanced quantitative MRI (qMRI) and machine learning to advance precision health in osteoarthritis (OA) management and knee replacement (KR) prediction. We combined deep learning-based segmentation of knee joint structures with dimensionality reduction to create an embedded feature space of imaging biomarkers. Through cross-sectional cohort analysis and statistical modeling, we identified specific biomarkers, including variations in cartilage thickness and medial meniscus shape, that are significantly associated with OA incidence and KR outcomes. Integrating these findings into a comprehensive framework represents a considerable step toward personalized knee-joint digital twins, which could enhance therapeutic strategies and inform clinical decision-making in rheumatological care. This versatile and reliable infrastructure has the potential to be extended to broader clinical applications in precision health.
Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions
Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour. This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics. The thesis is structured around two main themes: (i) AI alignment, measuring if AI models behave in a manner consistent with human values and (ii) performance monitoring, measuring if the models achieve specific accuracy goals or desires. The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical properties to derive and establish certain guarantees and insights into model monitoring.
Reviews: Partitioning Structure Learning for Segmented Linear Regression Trees
Originality: The paper is fairly original in that it proposes a new tree-splitting criterion that seems to work very well when the leaves are linear models rather than constants. It also provides a novel application of several pieces of previous work, including LASSO and random forests. There are adequate citations of related work. Quality: I did not carefully check the math or read the proofs in the supplemental material, but I did not observe any technical mistakes. There is not much discussion of the limitations of their approach.