Goto

Collaborating Authors

 prop









Online Markov Decision Processes with Terminal Law Constraints

Moreno, Bianca Marin, Brégère, Margaux, Gaillard, Pierre, Oudjane, Nadia

arXiv.org Machine Learning

Traditional reinforcement learning usually assumes either episodic interactions with resets or continuous operation to minimize average or cumulative loss. While episodic settings have many theoretical results, resets are often unrealistic in practice. The infinite-horizon setting avoids this issue but lacks non-asymptotic guarantees in online scenarios with unknown dynamics. In this work, we move towards closing this gap by introducing a reset-free framework called the periodic framework, where the goal is to find periodic policies: policies that not only minimize cumulative loss but also return the agents to their initial state distribution after a fixed number of steps. We formalize the problem of finding optimal periodic policies and identify sufficient conditions under which it is well-defined for tabular Markov decision processes. To evaluate algorithms in this framework, we introduce the periodic regret, a measure that balances cumulative loss with the terminal law constraint. We then propose the first algorithms for computing periodic policies in two multi-agent settings and show they achieve sublinear periodic regret of order $\tilde O(T^{3/4})$. This provides the first non-asymptotic guarantees for reset-free learning in the setting of $M$ homogeneous agents, for $M > 1$.


PROPS: Progressively Private Self-alignment of Large Language Models

Teku, Noel, Tian, Fengwei, Bhattacharjee, Payel, Chakraborty, Souradip, Bedi, Amrit Singh, Tandon, Ravi

arXiv.org Artificial Intelligence

Alignment is a key step in developing Large Language Models (LLMs) using human feedback to ensure adherence to human values and societal norms. Dependence on human feedback raises privacy concerns about how much a labeler's preferences may reveal about their personal values, beliefs, and personality traits. Existing approaches, such as Differentially Private SGD (DP-SGD), provide rigorous privacy guarantees by privatizing gradients during fine-tuning and alignment but can provide more privacy than necessary as human preferences are tied only to labels of (prompt, response) pairs and can degrade model utility. This work focuses on LLM alignment with preference-level privacy, which preserves the privacy of preference labels provided by humans. We propose PROPS (PROgressively Private Self-alignment), a multi-stage privacy preserving alignment framework where privately aligned models in previous stages can serve as labelers for supplementing training data in the subsequent stages of alignment. We present theoretical guarantees for PROPS as well as comprehensive validation using multiple models (Pythia and GPT) and datasets (AlpacaEval, Anthropic HH-RLHF, truthy-dpo-v0.1) to demonstrate the utility of PROPS over existing methods while still providing high privacy. For the same privacy budget, alignment via PROPS can achieve up to 3x higher win-rates compared to DP-SGD, and 2.5x higher win-rates compared to Randomized Response (RR) based alignment.


Bayesian Semiparametric Mixture Cure (Frailty) Models

Kızılaslan, Fatih, Vitelli, Valeria

arXiv.org Machine Learning

In recent years, mixture cure models have gained increasing popularity in survival analysis as an alternative to the Cox proportional hazards model, particularly in settings where a subset of patients is considered cured. The proportional hazards mixture cure model is especially advantageous when the presence of a cured fraction can be reasonably assumed, providing a more accurate representation of long-term survival dynamics. In this study, we propose a novel hierarchical Bayesian framework for the semiparametric mixture cure model, which accommodates both the inclusion and exclusion of a frailty component, allowing for greater flexibility in capturing unobserved heterogeneity among patients. Samples from the posterior distribution are obtained using a Markov chain Monte Carlo method, leveraging a hierarchical structure inspired by Bayesian Lasso. Comprehensive simulation studies are conducted across diverse scenarios to evaluate the performance and robustness of the proposed models. Bayesian model comparison and assessment are performed using various criteria. Finally, the proposed approaches are applied to two well-known datasets in the cure model literature: the E1690 melanoma trial and a colon cancer clinical trial.