AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Consistent Collaborative Filtering via Tensor Decomposition

Zhao, Shiwen, Crissman, Charles, Sapiro, Guillermo R

arXiv.org Machine LearningJan-28-2022

Collaborative filtering is the de facto standard for analyzing users' activities and building recommendation systems for items. In this work we develop Sliced Anti-symmetric Decomposition (SAD), a new model for collaborative filtering based on implicit feedback. In contrast to traditional techniques where a latent representation of users (user vectors) and items (item vectors) are estimated, SAD introduces one additional latent vector to each item, using a novel three-way tensor view of user-item interactions. This new vector extends user-item preferences calculated by standard dot products to general inner products, producing interactions between items when evaluating their relative preferences. SAD reduces to state-of-the-art (SOTA) collaborative filtering models when the vector collapses to one, while in this paper we allow its value to be estimated from data. The proposed SAD model is simple, resulting in an efficient group stochastic gradient descent (SGD) algorithm. We demonstrate the efficiency of SAD in both simulated and real world datasets containing over 1M user-item interactions. By comparing SAD with seven alternative SOTA collaborative filtering models, we show that SAD is able to more consistently estimate personalized preferences.

algorithm, consistent collaborative filtering, dataset, (13 more...)

arXiv.org Machine Learning

2201.11936

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > Czechia > Pardubice Region > Pardubice (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.69)
Media > Film (0.47)
Information Technology (0.47)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Dissecting the impact of different loss functions with gradient surgery

Xuan, Hong, Pless, Robert

arXiv.org Artificial IntelligenceJan-26-2022

Pair-wise loss is an approach to metric learning that learns a semantic embedding by optimizing a loss function that encourages images from the same semantic class to be mapped closer than images from different classes. The literature reports a large and growing set of variations of the pair-wise loss strategies. Here we decompose the gradient of these loss functions into components that relate to how they push the relative feature positions of the anchor-positive and anchor-negative pairs. This decomposition allows the unification of a large collection of current pair-wise loss functions. Additionally, explicitly constructing pair-wise gradient updates to separate out these effects gives insights into which have the biggest impact, and leads to a simple algorithm that beats the state of the art for image retrieval on the CAR, CUB and Stanford Online products datasets.

gradient, loss function, pair weight, (15 more...)

arXiv.org Artificial Intelligence

2201.11307

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Ma, Xiaoyu, Sardy, Sylvain, Hengartner, Nick, Bobenko, Nikolai, Lin, Yen Ting

arXiv.org Machine LearningJan-21-2022

To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform precise Monte Carlo simulations to show the effectiveness of our approach.

lasso ann, neural network, phase transition, (16 more...)

arXiv.org Machine Learning

2201.08652

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > Switzerland (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

On Maximum-a-Posteriori estimation with Plug & Play priors and stochastic gradient descent

Laumont, Rémi, de Bortoli, Valentin, Almansa, Andrés, Delon, Julie, Durmus, Alain, Pereyra, Marcelo

arXiv.org Machine LearningJan-16-2022

Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a non-local scale. In a departure from explicit modelling, several recent works have proposed and studied the use of implicit priors defined by an image denoising algorithm. This approach, commonly known as Plug & Play (PnP) regularisation, can deliver remarkably accurate results, particularly when combined with state-of-the-art denoisers based on convolutional neural networks. However, the theoretical analysis of PnP Bayesian models and algorithms is difficult and works on the topic often rely on unrealistic assumptions on the properties of the image denoiser. This papers studies maximum-a-posteriori (MAP) estimation for Bayesian models with PnP priors. We first consider questions related to existence, stability and well-posedness, and then present a convergence proof for MAP computation by PnP stochastic gradient descent (PnP-SGD) under realistic assumptions on the denoiser used. We report a range of imaging experiments demonstrating PnP-SGD as well as comparisons with other PnP schemes.

algorithm, denoiser, experiment, (15 more...)

arXiv.org Machine Learning

2201.06133

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

GitHub - hiroyuki-kasai/GDLibrary: Matlab library for gradient descent algorithms: Version 1.0.1

#artificialintelligenceJan-15-2022, 07:10:35 GMT

Latest library version: 1.0.1 (see Release notes for more info) The GDLibrary is a pure-Matlab library of a collection of unconstrained optimization algorithms. This solves an unconstrained minimization problem of the form, min f(x). Note that the SGDLibrary internally contains this GDLibrary. Run run_me_first for path configurations. Now, just execute demo for demonstration of this package.

gdlibrary, gradient descent algorithm, hiroyuki-kasai gdlibrary, (5 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

The Implicit Regularization of Momentum Gradient Descent with Early Stopping

Wang, Li, Zhou, Yingcong, Fu, Zhiguo

arXiv.org Machine LearningJan-14-2022

The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit $\ell_2$-regularization (ridge). In details, we study MGD in the continuous-time view, so-called momentum gradient flow (MGF), and show that its tendency is closer to ridge than the gradient descent (GD) [Ali et al., 2019] for least squares regression. Moreover, we prove that, under the calibration $t=\sqrt{2/\lambda}$, where $t$ is the time parameter in MGF and $\lambda$ is the tuning parameter in ridge regression, the risk of MGF is no more than 1.54 times that of ridge. In particular, the relative Bayes risk of MGF to ridge is between 1 and 1.035 under the optimal tuning. The numerical experiments support our theoretical results strongly.

mgf, prediction risk, regularization, (14 more...)

arXiv.org Machine Learning

2201.05405

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China > Jilin Province (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback

$\ell_1$-norm constrained multi-block sparse canonical correlation analysis via proximal gradient descent

Guan, Leying

arXiv.org Machine LearningJan-13-2022

Multi-block CCA constructs linear relationships explaining coherent variations across multiple blocks of data. We view the multi-block CCA problem as finding leading generalized eigenvectors and propose to solve it via a proximal gradient descent algorithm with $\ell_1$ constraint for high dimensional data. In particular, we use a decaying sequence of constraints over proximal iterations, and show that the resulting estimate is rate-optimal under suitable assumptions. Although several previous works have demonstrated such optimality for the $\ell_0$ constrained problem using iterative approaches, the same level of theoretical understanding for the $\ell_1$ constrained formulation is still lacking. We also describe an easy-to-implement deflation procedure to estimate multiple eigenvectors sequentially. We compare our proposals to several existing methods whose implementations are available on R CRAN, and the proposed methods show competitive performances in both simulations and a real data example.

canonical correlation analysis, mcca direction, proposition, (14 more...)

arXiv.org Machine Learning

2201.05289

Genre: Research Report (0.49)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

There is a Singularity in the Loss Landscape

Lowell, Mark

arXiv.org Artificial IntelligenceJan-11-2022

Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.

iteration, singularity, stability, (15 more...)

arXiv.org Artificial Intelligence

2201.06964

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits

Lee, Sunwoo, Sahu, Anit Kumar, He, Chaoyang, Avestimehr, Salman

arXiv.org Machine LearningJan-11-2022

Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2% higher validation accuracy than the periodic full averaging.

accuracy, iteration, training loss, (15 more...)

arXiv.org Machine Learning

2201.03789

Country:

North America > United States > California (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Non-Asymptotic Guarantees for Robust Statistical Learning under $(1+\varepsilon)$-th Moment Assumption

Xu, Lihu, Yao, Fang, Yao, Qiuran, Zhang, Huiming

arXiv.org Machine LearningJan-10-2022

There has been a surge of interest in developing robust estimators for models with heavy-tailed data in statistics and machine learning. This paper proposes a log-truncated M-estimator for a large family of statistical regressions and establishes its excess risk bound under the condition that the data have $(1+\varepsilon)$-th moment with $\varepsilon \in (0,1]$. With an additional assumption on the associated risk function, we obtain an $\ell_2$-error bound for the estimation. Our theorems are applied to establish robust M-estimators for concrete regressions. Besides convex regressions such as quantile regression and generalized linear models, many non-convex regressions can also be fit into our theorems, we focus on robust deep neural network regressions, which can be solved by the stochastic gradient descent algorithms. Simulations and real data analysis demonstrate the superiority of log-truncated estimations over standard estimations.

non-asymptotic guarantee, probability, regression, (14 more...)

arXiv.org Machine Learning

2201.03182

Country:

Asia > Macao (0.14)
Asia > China > Guangdong Province > Zhuhai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback