tmax
Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) under general parametrized policies with smooth and bounded policy gradients. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of O(1/ T) over a horizon of length T when the mixing time, ฯmix, is known to the learner. In absence of knowledge of ฯmix, the achievable rates change to O(1/T0.5 ฯต) provided that T O ฯ2/ฯตmix . Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.
Optimal Estimation of the Best Mean in Multi-Armed Bandits
We study the problem of estimating the mean reward of the best arm in a multiarmed bandit (MAB) setting. Specifically, given a target precision ฮตand confidence level 1 ฮด, the goal is to return an ฮต-accurate estimate of the largest mean reward with probability at least 1 ฮด, while minimizing the number of samples. We first establish an instance-dependent lower bound on the sample complexity, which requires handling the infinitely many possible candidates of the estimated best mean. This lower bound is expressed in a non-convex optimization problem, which becomes the main difficulty of this problem, preventing the direct application of standard techniques such as Track-and-Stop to provably achieve optimality. To overcome this difficulty, we introduce several new algorithmic and analytical techniques and propose an algorithm that achieves the asymptotic lower bound with matching constants in the leading term. Our method combines a confidence ellipsoid-based stopping condition with a two-phase sampling strategy tailored to manage non-convexity proposed algorithm is simple, nearly free of hyperparameters, and achieves the instance-dependent, asymptotically optimal sample complexity. Experimental results support our theoretical guarantees and demonstrate the practical effectiveness of our method.
Adaptive Experimentation for Censored Survival Outcomes
Wang, Yuxin, Frauen, Dennis, Schweisthal, Jonas, Schrรถder, Maresa, Javurek, Emil, Feuerriegel, Stefan
Adaptive experimentation enables efficient estimation of causal effects, but existing methods are not designed for survival data with censoring, where event times are only partially observed (e.g., overall survival in cancer trials but with dropout). In this paper, we develop a novel framework for adaptive experimentation to estimate causal effects under right censoring. For this, we derive the semiparametric efficiency bound for the average survival effect curve as a function of the treatment allocation policy and thereby obtain a closed-form efficiency-optimal allocation policy. The policy generalizes classical Neyman allocation to survival settings by prioritizing patient strata where both event and censoring dynamics induce high uncertainty. Building on this, we propose the Adaptive Survival Estimator (ASE), an adaptive framework that learns the allocation policy and estimates the average survival effect curve sequentially. Our framework has three main benefits: (i) it accommodates arbitrary machine learning models for nuisance estimation; (ii) it is guided by a closed-form efficiency-optimal allocation policy; and (iii) it admits strong theoretical guarantees, including asymptotic normality via a martingale central limit theorem. We demonstrate our framework across various numerical experiments to show consistent efficiency gains over uniform randomization and censoring-agnostic baselines.
Supplementary Material for " Path following algorithms for โ2-regularized M-estimation with approximation guarantee "
Figure S2: Number of iterations at each grid point for the Newton and gradient descent methods applying to the โ2-regularized logistic regression over simulated data generated in Example 2. We summarize the results in Figure S1-S3. Figure S1 presents the results for ridge regression. In this case, the number of iterations by gradient method first increases and then stays flat as tk grows. Newton method, however, only takes one 1.51.5 iteration at each grid point. Moreover, the level of approximation (i.e., ฯต) seems to have no impact onthe number of iterations at each grid point, which is highly desirable.
A.1 Hyper-Parameters For all datasets, the surrogate gradient function isฯ(x) = 1ฯ arctan(ฯ2ฮฑx) + 12, thus ฯ0(x) = ฮฑ 2(1+(ฯ
A.1 Hyper-Parameters For all datasets, the surrogate gradient function isฯ(x) = 1ฯ arctan(ฯ2ฮฑx) + 12, thus ฯ0(x) = The results on the three networks are consistent, indicating that RTD is a general sequential data augmentationmethod. We compare different surrogate functions, including Rectangular (ฯ0(x) = sign(|x| < 12)),ArcTan(ฯ0(x) = 11+(ฯx)2)and Constant 1(ฯ0(x) 1),intheSNNs on CIFAR-10. The results are shown in Tab.9. Tab.9 indicates that the choice of surrogate function has a considerable influence on the SNN's performance. Although Rectangular and Constant 1 can avoid the gradient exploding/vanishing problems in Eq.(8), they still cause lower accuracy or even make the optimization not converges.
Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting
Shi, Yiwen, Ren, Ping, Wang, Jing, Han, Biao, ValizadehAslani, Taha, Agbavor, Felix, Zhang, Yi, Hu, Meng, Zhao, Liang, Liang, Hualou
Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment. However, manual summarization of food effect from extensive drug application review documents is time-consuming, which arouses a need to develop automated methods. Recent advances in large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability regarding the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach, iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. These results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of PSG assessment cycle and promoting the generic drug product development.
An operational framework to automatically evaluate the quality of weather observations from third-party stations
Shao, Quanxi, Li, Ming, Dabrowski, Joel Janek, Bakar, Shuvo, Rahman, Ashfaqur, Powell, Andrea, Henderson, Brent
With increasing number of crowdsourced private automatic weather stations (called TPAWS) established to fill the gap of official network and obtain local weather information for various purposes, the data quality is a major concern in promoting their usage. Proper quality control and assessment are necessary to reach mutual agreement on the TPAWS observations. To derive near real-time assessment for operational system, we propose a simple, scalable and interpretable framework based on AI/Stats/ML models. The framework constructs separate models for individual data from official sources and then provides the final assessment by fusing the individual models. The performance of our proposed framework is evaluated by synthetic data and demonstrated by applying it to a re-al TPAWS network.
Learning to solve the single machine scheduling problem with release times and sum of completion times
Parmentier, Axel, T'Kindt, Vincent
In this paper, we focus on the solution of a hard single machine scheduling problem by new heuristic algorithms embedding techniques from machine learning field and scheduling theory. These heuristics transform an instance of the hard problem into an instance of a simpler one solved to optimality. The obtained schedule is then transposed to the original problem. Computational experiments show that they are competitive with state-of-the-art heuristics, notably on large instances.
Automated Verification and Tightening of Failure Propagation Models
Bittner, Benjamin (Fondazione Bruno Kessler) | Bozzano, Marco (Fondazione Bruno Kessler) | Cimatti, Alessandro (Fondazione Bruno Kessler) | Zampedri, Gianni (Fondazione Bruno Kessler)
Timed Failure Propagation Graphs (TFPGs) are used in the design of safety-critical systems as a way of modeling failure propagation, and to evaluate and implement diagnostic systems. TFPGs are a very rich formalism: they allow to model Boolean combinations of faults and events, also dependent on the operational modes of the system and quantitative delays between them. TFPGs are often produced manually, from a given dynamic system of greater complexity, as abstract representations of the system behavior under specific faulty conditions. In this paper we tackle two key difficulties in this process: first, how to make sure that no important behavior of the system is overlooked in the TFPG, and that no spurious, non-existent behavior is introduced; second, how to devise the correct values for the delays between events. We propose a model checking approach to automatically validate the completeness and tightness of a TFPG for a given infinite-state dynamic system, and a procedure for the automated synthesis of the delay parameters. The proposed approach is evaluated on a number of synthetic and industrial benchmarks.