Goto

Collaborating Authors

 emgd


37f0e884fbad9667e38940169d0a3c95-Reviews.html

Neural Information Processing Systems

The optimal first-order algorithm of Nesterov has linear convergence for such problem but the constant depends on the square root of the condition number k. The authors consider the situation where one has access to the expensive full gradient of the objective as well as a cheap stochastic gradient oracle. They propose a hybrid algorithm which only requires O(log 1/eps) calls to the full gradient oracle (independent of the condition number) and O(k^2 log(1/eps)) calls to the cheaper stochastic gradient oracle -- as long as the condition number is not too big, this could be faster in theory. The main idea behind their algorithm(called Epoch Mixed Gradient Descent - EMGD) is to replace a full gradient step (called an epoch) with a fixed number O(k^2) of mixed gradient steps which use a combination of the full gradient (computed once for the epoch) and stochastic gradients (which vary within an epoch). By taking the average of the O(k^2) iterates within an epoch, they can show a constant decrease of the suboptimality *independent* of the condition number, which is why the number of required full gradient step computations (the number of epochs) is independent from the condition number. They provide a simple and complete self-contained proof of their convergence rate, but no experiment.


Linear Convergence with Condition Number Independent Access of Full Gradients

Neural Information Processing Systems

For smooth and strongly convex optimization, the optimal iteration complexity of the gradient-based algorithm is $O(\sqrt{\kappa}\log 1/\epsilon)$, where $\kappa$ is the conditional number. In the case that the optimization problem is ill-conditioned, we need to evaluate a larger number of full gradients, which could be computationally expensive. In this paper, we propose to reduce the number of full gradient required by allowing the algorithm to access the stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use an combination of the gradient and the stochastic gradient to update the intermediate solutions. By performing a fixed number of mixed gradient descents, we are able to improve the sub-optimality of the solution by a constant factor, and thus achieve a linear convergence rate. Theoretical analysis shows that EMGD is able to find an $\epsilon$-optimal solution by computing $O(\log 1/\epsilon)$ full gradients and $O(\kappa^2\log 1/\epsilon)$ stochastic gradients.


Linear Convergence with Condition Number Independent Access of Full Gradients

Neural Information Processing Systems

For smooth and strongly convex optimizations, the optimal iteration complexity of the gradient-based algorithm is O( κlog1/ǫ), where κ is the condition number. In the case that the optimization problem is ill-conditioned, we need to evaluate a large number of full gradients, which could be computationally expensive. In this paper, we propose to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use a combination of the full and stochastic gradients to update the intermediate solution.


Elastic Multi-Gradient Descent for Parallel Continual Learning

Lyu, Fan, Feng, Wei, Li, Yuepan, Sun, Qing, Shang, Fanhua, Wan, Liang, Wang, Liang

arXiv.org Artificial Intelligence

The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks. Previously studied CL assumes that data are given in sequence nose-to-tail for different tasks, thus indeed belonging to Serial Continual Learning (SCL). This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse set of tasks is encountered at different time points. PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress, leading to the difficulty of guaranteeing effective model updates for all encountered tasks. In our previous conference work, we focused on measuring and reducing the discrepancy among gradients in a multi-objective optimization problem, which, however, may still contain negative transfers in every model update. To address this issue, in the dynamic multi-objective optimization problem, we introduce task-specific elastic factors to adjust the descent direction towards the Pareto front. The proposed method, called Elastic Multi-Gradient Descent (EMGD), ensures that each update follows an appropriate Pareto descent direction, minimizing any negative impact on previously learned tasks. To balance the training between old and new tasks, we also propose a memory editing mechanism guided by the gradient computed using EMGD. This editing process updates the stored data points, reducing interference in the Pareto descent direction from previous tasks. Experiments on public datasets validate the effectiveness of our EMGD in the PCL setting.


Linear Convergence with Condition Number Independent Access of Full Gradients

Zhang, Lijun, Mahdavi, Mehrdad, Jin, Rong

Neural Information Processing Systems

For smooth and strongly convex optimization, the optimal iteration complexity of the gradient-based algorithm is $O(\sqrt{\kappa}\log 1/\epsilon)$, where $\kappa$ is the conditional number. In the case that the optimization problem is ill-conditioned, we need to evaluate a larger number of full gradients, which could be computationally expensive. In this paper, we propose to reduce the number of full gradient required by allowing the algorithm to access the stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use an combination of the gradient and the stochastic gradient to update the intermediate solutions.


Linear Convergence with Condition Number Independent Access of Full Gradients

Zhang, Lijun, Mahdavi, Mehrdad, Jin, Rong

Neural Information Processing Systems

For smooth and strongly convex optimization, the optimal iteration complexity of the gradient-based algorithm is $O(\sqrt{\kappa}\log 1/\epsilon)$, where $\kappa$ is the conditional number. In the case that the optimization problem is ill-conditioned, we need to evaluate a larger number of full gradients, which could be computationally expensive. In this paper, we propose to reduce the number of full gradient required by allowing the algorithm to access the stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use an combination of the gradient and the stochastic gradient to update the intermediate solutions. By performing a fixed number of mixed gradient descents, we are able to improve the sub-optimality of the solution by a constant factor, and thus achieve a linear convergence rate. Theoretical analysis shows that EMGD is able to find an $\epsilon$-optimal solution by computing $O(\log 1/\epsilon)$ full gradients and $O(\kappa^2\log 1/\epsilon)$ stochastic gradients.