Goto

Collaborating Authors

 Gradient Descent


Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

arXiv.org Artificial Intelligence

We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nystr\"om approximation to $A$ using sparse random sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr\"om approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any $n\times n$ linear system that is well-conditioned except for $k$ outlying large singular values in $\tilde{O}(n^{2.065} + k^\omega)$ time, improving on a recent result of [Derezi\'nski, Yang, STOC 2024] for all $k \gtrsim n^{0.78}$. 2. We give the first $\tilde{O}(n^2 + {d_\lambda}^{\omega}$) time algorithm for solving a regularized linear system $(A + \lambda I)x = b$, where $A$ is positive semidefinite with effective dimension $d_\lambda$. This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten $p$-norms and other matrix norms. For example, for the Schatten 1 (nuclear) norm, we give an algorithm that runs in $\tilde{O}(n^{2.11})$ time, improving on an $\tilde{O}(n^{2.18})$ method of [Musco et al., ITCS 2018]. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.


Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning with multiple, potentially conflicting objectives is pervasive in real-world applications, while this problem remains theoretically under-explored. This paper tackles the multi-objective reinforcement learning (MORL) problem and introduces an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals. Notably, we provide the first analysis of finite-time Pareto-stationary convergence and corresponding sample complexity in both discounted and average reward settings. Our approach has two salient features: (a) MOAC mitigates the cumulative estimation bias resulting from finding an optimal common gradient descent direction out of stochastic samples. This enables provable convergence rate and sample complexity guarantees independent of the number of objectives; (b) With proper momentum coefficient, MOAC initializes the weights of individual policy gradients using samples from the environment, instead of manual initialization. This enhances the practicality and robustness of our algorithm. Finally, experiments conducted on a real-world dataset validate the effectiveness of our proposed method.


On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression

arXiv.org Artificial Intelligence

We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression in which each data sample belongs to one class. We also provide a general and constructive estimate of the convergence rate to the maximum likelihood estimate when gradient descent is used as the optimizer. Our estimate involves bounding the condition number of the Hessian of the maximum likelihood function. The approaches used in this article rely on a simple operator-theoretic framework.


Variance Control for Black Box Variational Inference Using The James-Stein Estimator

arXiv.org Machine Learning

Black Box Variational Inference is a promising framework in a succession of recent efforts to make Variational Inference more ``black box". However, in basic version it either fails to converge due to instability or requires some fine-tuning of the update steps prior to execution that hinder it from being completely general purpose. We propose a method for regulating its parameter updates by reframing stochastic gradient ascent as a multivariate estimation problem. We examine the properties of the James-Stein estimator as a replacement for the arithmetic mean of Monte Carlo estimates of the gradient of the evidence lower bound. The proposed method provides relatively weaker variance reduction than Rao-Blackwellization, but offers a tradeoff of being simpler and requiring no fine tuning on the part of the analyst. Performance on benchmark datasets also demonstrate a consistent performance at par or better than the Rao-Blackwellized approach in terms of model fit and time to convergence.


Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

arXiv.org Artificial Intelligence

Reversal curse (Berglund et al., 2023) refers to the phenomenon that an auto-regressive LLM that learns "A is B" during training fails to generalize to the reverse direction "B is A", and this task is also termed as "inverse search" in Allen-Zhu and Li (2023). Although some previous works propose different methods to mitigate the reversal curse, including reversing the training dataset (Guo et al., 2024; Golovneva et al., 2024) and training on different objectives such as autoregressive blank infilling (Lv et al., 2023), these methods might negatively affect the model performance on other tasks since they either alter the dataset or the model architecture. Without dataset manipulation or changing the auto-regressive nature (causal structure) of the model, the reversal curse is hard to mitigate even with ICL strategies such as chain-of-thought (Allen-Zhu and Li, 2023; Guo et al., 2024). In this paper, we aim to theoretically study why the reversal curse happens for auto-regressive LLMs. Different from previous work that studies the capacity of (transformer-based (Vaswani et al., 2017)) LLMs through the lens of expressivity (e.g., Yun et al. (2019); Pérez et al. (2021); Feng et al. (2024)), reversal curse cannot be explained by expressivity since a model can express "A is B" is also able to express "B is A". Therefore, we analyze the reversal curse via training dynamics since even if a set of parameters can express a fact in both directions, it might not be reachable through popular training algorithms (e.g., gradient descent, AdamW (Loshchilov and Hutter, 2017)) with training data only presented in one direction. We summarize our main contributions as follows: We theoretically analyze reversal curse where training or test sequences have the from "A B" or "B A" via training dynamics of (stochastic) gradient descent under two auto-regressive models: a bilinear model (Section 3) and one-layer transformers under certain assumptions similar to Tian et al. (2023a) (Section 4). The analysis of the training dynamics of both models reveals a core reason why the reversal curse happens: the weights of the autoregressive models are asymmetric, i.e., the increase of weights from the token A to token B


Efficient Online Set-valued Classification with Bandit Feedback

arXiv.org Machine Learning

Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with correct actions, thereby affecting the accuracy of quantile estimation. To address these limitations, we propose Bandit Class-specific Conformal Prediction (BCCP), offering coverage guarantees on a class-specific granularity. Using an unbiased estimation of an estimand involving the true label, BCCP trains the model and makes set-valued inferences through stochastic gradient descent. Our approach overcomes the challenges of sparsely labeled data in each iteration and generalizes the reliability and applicability of conformal prediction to online decision-making environments.


$\epsilon$-Policy Gradient for Online Pricing

arXiv.org Machine Learning

Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $\epsilon$-policy gradient algorithm for the online pricing learning task. The algorithm extends $\epsilon$-greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the exploration cost in terms of the exploration probability $\epsilon$ and the exploitation cost in terms of the gradient descent optimization and gradient estimation errors. The algorithm achieves an expected regret of order $\mathcal{O}(\sqrt{T})$ (up to a logarithmic factor) over $T$ trials.


Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

arXiv.org Artificial Intelligence

In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action trajectories. Our key contributions are: first, we design a safety model that specifically performs credit assignment to assess contributions of partial state-action trajectories on safety. This safety model is trained using a labeled safety dataset. Second, using RL-as-inference strategy we derive an effective algorithm for optimizing a safe policy using the learned safety model. Finally, we devise a method to dynamically adapt the tradeoff coefficient between reward maximization and safety compliance. We rewrite the constrained optimization problem into its dual problem and derive a gradient-based method to dynamically adjust the tradeoff coefficient during training. Our empirical results demonstrate that this approach is highly scalable and able to satisfy sophisticated non-Markovian safety constraints.


PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

arXiv.org Artificial Intelligence

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees $(\epsilon, \delta)$-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of $\mathcal{O}(1/\sqrt{nK})$, where $n$ and $K$ are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to $n$. Leveraging the moments accountant method, we further derive an optimal $K$ to maximize the model utility under certain privacy budget in decentralized settings. With this optimized $K$, PrivSGP-VR achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}{\delta} \right)}/(\sqrt{n}J\epsilon) \right)$, where $J$ and $d$ are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of $1/\sqrt{n}$ improvement compared to that of the existing decentralized counterparts, such as A(DP)$^2$SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized $K$, in fully decentralized settings.


A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints

arXiv.org Artificial Intelligence

Traditional mathematical programming solvers require long computational times to solve constrained minimization problems of complex and large-scale physical systems. Therefore, these problems are often transformed into unconstrained ones, and solved with computationally efficient optimization approaches based on first-order information, such as the gradient descent method. However, for unconstrained problems, balancing the minimization of the objective function with the reduction of constraint violations is challenging. We consider the class of time-dependent minimization problems with increasing (possibly) nonlinear and non-convex objective function and non-decreasing (possibly) nonlinear and non-convex inequality constraints. To efficiently solve them, we propose a penalty-based guardrail algorithm (PGA). This algorithm adapts a standard penalty-based method by dynamically updating the right-hand side of the constraints with a guardrail variable which adds a margin to prevent violations. We evaluate PGA on two novel application domains: a simplified model of a district heating system and an optimization model derived from learned deep neural networks. Our method significantly outperforms mathematical programming solvers and the standard penalty-based method, and achieves better performance and faster convergence than a state-of-the-art algorithm (IPDD) within a specified time limit.