bart
Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates
Bhandari, Saurabh, Bhatti, Parveen, Chiu, Brian C. -H., Ji, Yuan
In the era of precision medicine, genome-wide epigenetic modifications offer rich data that could inform risk prediction. However, these data are high-dimensional and exhibit complex dependence structures, which makes it difficult to jointly model them with low-dimensional covariates when the goal is to obtain interpretable effect estimates for covariate adjustment. Standard Bayesian additive regression trees (BART) provide strong predictive performance but treat all predictors uniformly within the tree ensemble, obscuring the contributions of significant covariates and complicating variable selection in high-dimensional settings. We propose a semi-parametric BART model (spBART) that addresses this limitation by modeling low-dimensional covariates through a parametric component with interpretable coefficients, while capturing complex nonlinear associations among high-dimensional predictors through the tree ensemble. To perform stable variable selection, we develop a cross-validation-based procedure that aggregates posterior inclusion probabilities across folds and applies Bayesian false discovery rate control. We apply the proposed method to a pooled case--control analysis of high-dimensional genome-wide 5-hydroxymethylcytosine profiles derived from circulating cell-free DNA in two multiple myeloma studies ($N = 869$). The approach identifies a parsimonious set of candidate loci and achieves strong out-of-sample discrimination (AUC $= 0.96$) in a held-out validation set. Overall, spBART provides a unified framework for combining interpretable covariate inference with flexible modeling and variable selection in high-dimensional biomedical studies.
A Appendix
A.1 Summary of Commonly Used Metrics for T ext Generation Table 1: Summary of commonly used metrics for text generation. For settings and tasks, we only list the ones justified by the original paper for each metric. We conduct experiments on WMT19, and the results are shown in Tab. 2. We don't observe A.3 Prompt Set In Tab. 3, we list the full prompt set for both s h direction and h r direction. Prompt Set s h Last Tersely Succinctly In summation To put it succinctly After In brief All in all To summarize Bringing up the rear Behind In short In outline In a nutshell To come to the point Lastly Concisely In closing In conclusion In the final analysis In sum In precis In passing In winding up Without wasting words To end In a word To conclude Last in order At the end of the day Curtly Compactly Summarising In a few words Without waste of words Crisply Summarily In the rear As a final point Finally yet importantly At last To sum up Summarizing Not least of all To put it in a nutshell Pithily Basically Laconically To put it briefly When all is said and done Shortly In the end At the rear Not to mince words To cut a long story short In fine At the end To be brief Last but not least Not to beat about the bush Finally In essence Last of all Just as importantly In drawing things to a close Briefly Ultimately Elliptically To put it concisely Not to put too fine a point on ith r As To wit As it were Case in point As an illustration sc. That is Especially That is to say To give an example i.e.
BFTS: Thompson Sampling with Bayesian Additive Regression Trees
Deng, Ruizhe, Chakraborty, Bibhas, Chen, Ran, Tan, Yan Shuo
Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.
NeurIPS Rebuttal for " Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks "
NeurIPS Rebuttal for "Retrieval-Augmented Generation for Knowledge-Intensive NLP T asks" We thank reviewers for their thoughtful, detailed reviews. "information retrieval strategy to improve the the generation Pre-trained seq2seq models have only become available in the last year (T5, BART) or two (GPT2). We study two RAG models. RAG-Sequence's formulation is similar to REALM, but RAG-Token is novel and Further, we explore novel decoding strategies for these models. "contribution [...] is not very specific, since R1 suggested that "A figure or example about P AG-Sequence Model and P AG-Token Model is needed", and R3 mentions "description of the model is quite concise (due to space restrictions)".