AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Review for NeurIPS paper: A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Neural Information Processing SystemsJan-24-2025, 10:31:13 GMT

Originally, the paper got three positive scores: 7,7,6, all with very high confidences. Basically, there was no critical issues raised by the reviewers, except suggesting the enhancement on experiments and clarifying some points. During discussion, all the reviewers agreed that the paper should be accepted and Reviewer #3 raised his/her score to 7. So all the reviewers reached a consensus. Thus the AC decided to accept the paper. However, the AC also have the following comments after a quick reading.

algorithm, nonconvex-concave min-max problem, single-loop smoothed gradient descent-ascent algorithm, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Stein Variational Gradient Descent With Matrix-Valued Kernels

Neural Information Processing SystemsJan-24-2025, 02:00:29 GMT

This has clarified some of my comments, and thus I have increased my score by one level. However, a negative aspect from the author response was the excuse given for the absence of a comparison against SVN; "the results we obtained were much worse than our methods and baselines, and hence did not investigate it further". To me this seems like a very obvious thing to include and discuss in the paper, lending strong support to the new method, so I am a bit suspicious about why it wasn't included. I hope the authors will include it in the revised manuscript, if accepted. Another reviewer commented on the lack of wall-time comparison, and again I feel that the authors did not give a valid excuse for omitting this information from the manuscript (the reader can, I think, be trusted to adjust for different computational setups and hardware when interpreting the wall-time data).

experiment, matrix-valued kernel, stein variational gradient descent, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

Reviews: Stein Variational Gradient Descent With Matrix-Valued Kernels

Neural Information Processing SystemsJan-24-2025, 02:00:17 GMT

The reviewers felt that this submission represents an important contribution to the field. Please be sure to carefully review and address the concerns of all reviewers (e.g., introducing a comparison with SVNM and providing complete implementation details) in the revision.

matrix-valued kernel, reviewer, stein variational gradient descent

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing SystemsJan-24-2025, 00:51:32 GMT

Update: I read authors' responce RE:sampling rate does not tell the whole story - i was suggesting to add information about on average how many instances were used for each of the splits (because it is not equal to sampling rate * total dataset size). I am keeping my accept rating, hoping that authors do make the changes to improve the derivations/clarity in the final submission Summary: this paper is concerned with a common trick that a lot of GBDT implementation apply - subsampling instances in order to speed up calculations for finding the best split. The authors formulate the problem of choosing the instances to sample as an optimization problem and derive a modified sampling scheme that is aimed at mimicking the gain that would be assigned to a split on all the of the data by using a gain calculated only on a subsampled instances. The experiments demonstrate good results. The paper is well written and easy to follow, apart from a couple of places in derivations(see my questions).

gradient, minimal variance sampling, quantile, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing SystemsJan-24-2025, 00:31:05 GMT

The authors propose a non-uniform sampling strategy for stochastic gradient boosted decision trees. In particular, sampling probability of the training data is optimized towards maximizing the estimation accuracy of the splitting score of decision trees. The optimization problem allows an approximate closed-form solution. Experiment results demonstrate superior performance of the proposed strategy. The reviewers agree that the paper can not only help understand sampling within GBDT from a more rigorous perspective but also improve GBDT implementations in practice.

decision tree, implementation, minimal variance sampling, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Decoupled SGDA for Games with Intermittent Strategy Communication

Zindari, Ali, Yazdkhasti, Parham, Rodomanov, Anton, Chavdarova, Tatjana, Stich, Sebastian U.

arXiv.org Artificial IntelligenceJan-24-2025

We focus on reducing communication overhead in multiplayer games, where frequently exchanging strategies between players is not feasible and players have noisy or outdated strategies of the other players. We introduce Decoupled SGDA, a novel adaptation of Stochastic Gradient Descent Ascent (SGDA). In this approach, players independently update their strategies based on outdated opponent strategies, with periodic synchronization to align strategies. For Strongly-Convex-Strongly-Concave (SCSC) games, we demonstrate that Decoupled SGDA achieves near-optimal communication complexity comparable to the best-known GDA rates. For weakly coupled games where the interaction between players is lower relative to the non-interactive part of the game, Decoupled SGDA significantly reduces communication costs compared to standard SGDA. Our findings extend to multi-player games. To provide insights into the effect of communication frequency and convergence, we extensively study the convergence of Decoupled SGDA for quadratic minimax problems. Lastly, in settings where the noise over the players is imbalanced, Decoupled SGDA significantly outperforms federated minimax methods.

artificial intelligence, decoupled sgda, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.14652

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.65)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.56)

Add feedback

Context-Aware Neural Gradient Mapping for Fine-Grained Instruction Processing

Boldo, David, Pemberton, Lily, Thistledown, Gabriel, Fairchild, Jacob, Kowalski, Felix

arXiv.org Artificial IntelligenceJan-24-2025

The integration of contextual embeddings into the optimization processes of large language models is an advancement in natural language processing. The Context-Aware Neural Gradient Mapping framework introduces a dynamic gradient adjustment mechanism, incorporating contextual embeddings directly into the optimization process. This approach facilitates real-time parameter adjustments, enhancing task-specific generalization even in the presence of sparse or noisy data inputs. The mathematical foundation of this framework relies on gradient descent modifications, where contextual embeddings are derived from a supplementary neural network trained to map input features to optimal adaptation gradients. By employing differential geometry principles, high-dimensional input dependencies are encoded into low-dimensional gradient manifolds, enabling efficient adaptation without necessitating the retraining of the entire model. Empirical evaluations demonstrate that the proposed framework consistently outperforms baseline models across various metrics, including accuracy, robustness to noise, and computational efficiency. The integration of context-specific embeddings allows for a more complex understanding of language, thereby improving the model's ability to handle diverse linguistic phenomena. Furthermore, the computational efficiency achieved through this method demonstrates its scalability for large-scale language models operating under diverse constraints.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.14936

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Reviews: What Can ResNet Learn Efficiently, Going Beyond Kernels?

Neural Information Processing SystemsJan-23-2025, 22:05:01 GMT

Dear Authors: I read your rebuttal. I do indeed understand the point of your paper. I also agree that solving linear equations is not a good example of something kernel methods can't do. The'isotonic regression' algorithm for learning a ReLU is a simple SGD algorithm that uses a straight-through estimator. The high-level message of your paper is that there are problems that gradient-based methods can solve, but kernel methods cannot.

kernel, resnet learn efficiently, separation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Competitive Gradient Descent

Neural Information Processing SystemsJan-23-2025, 20:33:56 GMT

This paper deals with the computation of Nash Equilibria in competitive two-player games where the x player is minimizing a function f(x,y) and the y player is minimizing a function g(x,y) . Such problems arise in a wide variety of domains, notably in training GANs, and there has been much recent interest in developing algorithms for solving such problems. Gradient Descent Ascent (GDA) is a natural candidate algorithm for finding Nash Equilibrium, but it will provably oscillate or diverge even in simple settings. As such, many recent works have modified GDA or proposed different algorithms or schemes to guarantee convergence. This paper proposes a new algorithm called Competitive Gradient Descent (CGD), which updates each player's iterates by adding the Nash Equilibrium of a regularized bilinear approximation of the game at the current iterates. CGD requires Hessian-vector products, making it a second-order algorithm.

algorithm, competitive gradient descent, hessian-vector product, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Add feedback

Reviews: Competitive Gradient Descent

Neural Information Processing SystemsJan-23-2025, 20:33:46 GMT

Overall, the reviewers appreciated the novelty and simplicity of the algorithm. The paper is also well written and the methods are well motivated. In terms of content for revision: The lack of discussion around Newton-style approximation and dropping of diagonal terms in the Jacobian was raised again the discussions (see updated reviews). Additionally, we also sought an expert reviewer during discussions. The additional reviewer included below: ---- Additional review: The paper proposes an interesting new method for solving games Min_x F(x,y) Min_y G(x,y) which is, in general, different from running online gradient descent on x and y in parallel, or other existing methods.

competitive gradient descent, equilibrium, reviewer, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback