Regression
Reviews: Partitioning Structure Learning for Segmented Linear Regression Trees
The paper proposes and investigates how to learn tree structure for linear regression trees based on a conditional Kendall's tau statistics with theoretical analysis.The ideas were new and generally satisfying to reviewers. While some reviewers would have liked to see even more experiments and experimental comparisons and details, other reviewers felt that the author response about the experiments was satisfying.
Reviews: Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression
The paper proposes a simple algorithm for L_p regression problems and justifies its efficiency both theoretically and empirically. Major concerns: 1) Related work: The comparison with related works is not sufficient. In Section, it only compares with the polynomial dependence. What are the advantages of the proposed one? Furthermore, the lack of sufficient baselines (at least Bubeck et al. [BCLL19] or Adil et al. [AKPS19]) in the numerical experiments weakens the superior of the proposed algorithm.
Reviews: Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression
This paper proposes a modified IRLS algorithm and presents empirical experiments and theoretical analysis. The reviewers viewed the contribution as a combination of existing methods, and the combination is novel. Most reviewers thought the paper was well-written. The ability of the authors to compare to existing methods is limited because of the complexity of other methods and lack of public implementations. The reviewers' scores place this paper above the bar for acceptance.
Review for NeurIPS paper: Robust Meta-learning for Mixed Linear Regression with Small Batches
More specifically, suppose we deal with n linear regression data sets after which we are challenged with a final learning task of linear regression, but the parameters of these "tasks" are not completely unrelated. In particular, suppose there is a prior distribution (with at most k possible outcomes) from which parameters of linear regression (i.e., the linear function and noise's variance) are sampled. The general idea here is that by learning from "different" (yet related) tasks the learner aims to do better on the final task, and the paper's focus is on a theoretically natural setting.
Mining Social Determinants of Health for Heart Failure Patient 30-Day Readmission via Large Language Model
Shao, Mingchen, Kang, Youjeong, Hu, Xiao, Kwak, Hyunjung Gloria, Yang, Carl, Lu, Jiaying
Heart Failure (HF) affects millions of Americans and leads to high readmission rates, posing significant healthcare challenges. While Social Determinants of Health (SDOH) such as socioeconomic status and housing stability play critical roles in health outcomes, they are often underrepresented in structured EHRs and hidden in unstructured clinical notes. This study leverages advanced large language models (LLMs) to extract SDOHs from clinical text and uses logistic regression to analyze their association with HF readmissions.
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
Crawshaw, Michael, Woodworth, Blake, Liu, Mingrui
We analyze two variants of Local Gradient Descent applied to distributed logistic regression with heterogeneous, separable data and show convergence at the rate $O(1/KR)$ for $K$ local steps and sufficiently large $R$ communication rounds. In contrast, all existing convergence guarantees for Local GD applied to any problem are at least $\Omega(1/R)$, meaning they fail to show the benefit of local updates. The key to our improved guarantee is showing progress on the logistic regression objective when using a large stepsize $\eta \gg 1/K$, whereas prior analysis depends on $\eta \leq 1/K$.
Asymmetrical Latent Representation for Individual Treatment Effect Modeling
Lacombe, Armand, Sebag, Michèle
Conditional Average Treatment Effect (CATE) estimation, at the heart of counterfactual reasoning, is a crucial challenge for causal modeling both theoretically and applicatively, in domains such as healthcare, sociology, or advertising. Borrowing domain adaptation principles, a popular design maps the sample representation to a latent space that balances control and treated populations while enabling the prediction of the potential outcomes. This paper presents a new CATE estimation approach based on the asymmetrical search for two latent spaces called Asymmetrical Latent Representation for Individual Treatment Effect (ALRITE), where the two latent spaces are respectively intended to optimize the counterfactual prediction accuracy on the control and the treated samples. Under moderate assumptions, ALRITE admits an upper bound on the precision of the estimation of heterogeneous effects (PEHE), and the approach is empirically successfully validated compared to the state-of-the-art
Reviews: Sample Complexity of Learning Mixture of Sparse Linear Regressions
The dependence of SNR is extreme. I wonder whether it only occurs in the proof or a fundamental limitation of the approach. The authors did not provide a empirical comparison to any competing method even to [27] on which the presented algorithm improves. It would be interesting to see how the algorithm competes with the state-of-the-art in its empirical performance particularly in the presence of noise. Isn't the proof providing any dependence on L? 3. Some key definitions are missing.