Goto

Collaborating Authors

 greedy selection




ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

Neural Information Processing Systems

We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.


Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex

Neural Information Processing Systems

In this paper we study the well-known greedy coordinate descent (GCD) algorithm to solve $\ell_1$-regularized problems and improve GCD by the two popular strategies: Nesterov's acceleration and stochastic optimization. Firstly, we propose a new rule for greedy selection based on an $\ell_1$-norm square approximation which is nontrivial to solve but convex; then an efficient algorithm called ``SOft ThreshOlding PrOjection (SOTOPO)'' is proposed to exactly solve the $\ell_1$-regularized $\ell_1$-norm square approximation problem, which is induced by the new rule. Based on the new rule and the SOTOPO algorithm, the Nesterov's acceleration and stochastic optimization strategies are then successfully applied to the GCD algorithm. The resulted algorithm called accelerated stochastic greedy coordinate descent (ASGCD) has the optimal convergence rate $O(\sqrt{1/\epsilon})$; meanwhile, it reduces the iteration complexity of greedy selection up to a factor of sample size. Both theoretically and empirically, we show that ASGCD has better performance for high-dimensional and dense problems with sparse solution.


Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex

Neural Information Processing Systems

PrOjection (SOTOPO)" is proposed to exactly solve an In order to improve the convergence rate and reduce the iteration cost further, two important strategies are used in first-order methods: Nesterov's acceleration and stochastic optimization. Nesterov's acceleration is referred to the technique that uses some algebra trick to accelerate first-order algorithms; while stochastic optimization is referred to the method that samples one training This work is supported by the National Natural Science Foundation of China under grant Nos.


Infrequent Exploration in Linear Bandits

arXiv.org Artificial Intelligence

We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploration. INFEX executes a base exploratory policy according to a given schedule while predominantly choosing greedy actions in between. Despite its simplicity, our theoretical analysis demonstrates that INFEX achieves instance-dependent regret matching standard provably efficient algorithms, provided the exploration frequency exceeds a logarithmic threshold. Additionally, INFEX is a general, modular framework that allows seamless integration of any fully adaptive exploration method, enabling wide applicability and ease of adoption. By restricting intensive exploratory computations to infrequent intervals, our approach can also enhance computational efficiency. Empirical evaluations confirm our theoretical findings, showing state-of-the-art regret performance and runtime improvements over existing methods.


Greedy Selection under Independent Increments: A Toy Model Analysis

arXiv.org Machine Learning

We study an iterative selection problem over N i.i.d. discrete-time stochastic processes with independent increments. At each stage, a fixed number of processes are retained based on their observed values. Under this simple model, we prove that the optimal strategy for selecting the final maximum-value process is to apply greedy selection at each stage. While the result relies on strong independence assumptions, it offers a clean justification for greedy heuristics in multi-stage elimination settings and may serve as a toy example for understanding related algorithms in high-dimensional applications.


MUSS: Multilevel Subset Selection for Relevance and Diversity

arXiv.org Artificial Intelligence

The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. Constrained subset selection problem is NP-hard, and popular approaches such as Maximum Marginal Relevance (MMR) are based on greedy selection. Many real-world applications involve large data, but the original MMR work did not consider distributed selection. This limitation was later addressed by a method called DGDS which allows for a distributed setting using random data partitioning. Here, we exploit structure in the data to further improve both scalability and performance on the target application. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. We provide a rigorous theoretical analysis and show that our method achieves a constant factor approximation of the optimal objective. In a recommender system application, our method can achieve the same level of performance as baselines, but 4.5 to 20 times faster. Our method is also capable of outperforming baselines by up to 6 percent points of RAG-based question answering accuracy.



7810ccd41bf26faaa2c4e1f20db70a71-Reviews.html

Neural Information Processing Systems

The authors suggest the use of a criterion, Σ-Optimality, for active learning in Gauss-Markov random fields. The criterion itself was originally proposed by Garnett et al for active surveying, but it does not appear that the submodular property was recognized in that previous work. Labeled and unlabeled are embedded in a graph nodes represent both labeled and unlabled data and edge weights, computed via a kernel, capture similarity. The motivation for an active approach is that acquiring labels on the full data set may incur some cost (presumably greater than computing the edge weights over all data) so a criterion is used to determine which of the remaining unlabeled data should be labeled. The authors establish that the criterion satifies the submodular monotone property and as such greedy selection achieve (1-1/e) performance relative to optimal selection.