prop
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (2 more...)
Online Markov Decision Processes with Terminal Law Constraints
Moreno, Bianca Marin, Brégère, Margaux, Gaillard, Pierre, Oudjane, Nadia
Traditional reinforcement learning usually assumes either episodic interactions with resets or continuous operation to minimize average or cumulative loss. While episodic settings have many theoretical results, resets are often unrealistic in practice. The infinite-horizon setting avoids this issue but lacks non-asymptotic guarantees in online scenarios with unknown dynamics. In this work, we move towards closing this gap by introducing a reset-free framework called the periodic framework, where the goal is to find periodic policies: policies that not only minimize cumulative loss but also return the agents to their initial state distribution after a fixed number of steps. We formalize the problem of finding optimal periodic policies and identify sufficient conditions under which it is well-defined for tabular Markov decision processes. To evaluate algorithms in this framework, we introduce the periodic regret, a measure that balances cumulative loss with the terminal law constraint. We then propose the first algorithms for computing periodic policies in two multi-agent settings and show they achieve sublinear periodic regret of order $\tilde O(T^{3/4})$. This provides the first non-asymptotic guarantees for reset-free learning in the setting of $M$ homogeneous agents, for $M > 1$.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Media > Television (0.45)
- Energy > Power Industry (0.45)
PROPS: Progressively Private Self-alignment of Large Language Models
Teku, Noel, Tian, Fengwei, Bhattacharjee, Payel, Chakraborty, Souradip, Bedi, Amrit Singh, Tandon, Ravi
Alignment is a key step in developing Large Language Models (LLMs) using human feedback to ensure adherence to human values and societal norms. Dependence on human feedback raises privacy concerns about how much a labeler's preferences may reveal about their personal values, beliefs, and personality traits. Existing approaches, such as Differentially Private SGD (DP-SGD), provide rigorous privacy guarantees by privatizing gradients during fine-tuning and alignment but can provide more privacy than necessary as human preferences are tied only to labels of (prompt, response) pairs and can degrade model utility. This work focuses on LLM alignment with preference-level privacy, which preserves the privacy of preference labels provided by humans. We propose PROPS (PROgressively Private Self-alignment), a multi-stage privacy preserving alignment framework where privately aligned models in previous stages can serve as labelers for supplementing training data in the subsequent stages of alignment. We present theoretical guarantees for PROPS as well as comprehensive validation using multiple models (Pythia and GPT) and datasets (AlpacaEval, Anthropic HH-RLHF, truthy-dpo-v0.1) to demonstrate the utility of PROPS over existing methods while still providing high privacy. For the same privacy budget, alignment via PROPS can achieve up to 3x higher win-rates compared to DP-SGD, and 2.5x higher win-rates compared to Randomized Response (RR) based alignment.
- North America > United States > Arizona (0.05)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
Bayesian Semiparametric Mixture Cure (Frailty) Models
Kızılaslan, Fatih, Vitelli, Valeria
In recent years, mixture cure models have gained increasing popularity in survival analysis as an alternative to the Cox proportional hazards model, particularly in settings where a subset of patients is considered cured. The proportional hazards mixture cure model is especially advantageous when the presence of a cured fraction can be reasonably assumed, providing a more accurate representation of long-term survival dynamics. In this study, we propose a novel hierarchical Bayesian framework for the semiparametric mixture cure model, which accommodates both the inclusion and exclusion of a frailty component, allowing for greater flexibility in capturing unobserved heterogeneity among patients. Samples from the posterior distribution are obtained using a Markov chain Monte Carlo method, leveraging a hierarchical structure inspired by Bayesian Lasso. Comprehensive simulation studies are conducted across diverse scenarios to evaluate the performance and robustness of the proposed models. Bayesian model comparison and assessment are performed using various criteria. Finally, the proposed approaches are applied to two well-known datasets in the cure model literature: the E1690 melanoma trial and a colon cancer clinical trial.
- Europe > Austria > Vienna (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Asia > Singapore (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)