AITopics | Drori, Yoel

Collaborating Authors

Drori, Yoel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Locally Optimal Descent for Dynamic Stepsize Scheduling

Yehudai, Gilad, Cohen, Alon, Daniely, Amit, Drori, Yoel, Koren, Tomer, Schain, Mariano

arXiv.org Machine LearningNov-23-2023

Stochastic gradient-based optimization methods such as SGD and Adam (Kingma & Ba, 2014) are the main workhorse behind modern machine learning. Such methods sequentially apply stochastic gradient steps to update the trained model and their performance crucially depends on the choice of a learning rate sequence, or schedule, used throughout this process to determine the magnitude of the sequential updates. All in all, effectively tuning the learning rate schedule is widely considered a tedious task requiring extensive, sometimes prohibitive, hyper-parameter search, resulting in a significant excess of engineering time and compute resources usage in ML training. A prominent approach to address this issue gave rise to a plethora of adaptive optimization methods (most notably Duchi et al., 2011 and Kingma & Ba, 2014), where the learning rate parameter is automatically tuned during the optimization process based on previously received stochastic gradients. In some important applications these methods provide superior convergence performance, while their theoretical guarantees match the state-of-the-art in the stochastic convex and (smooth) non-convex optimization settings (Li & Orabona, 2019; Ward et al., 2020; Attia & Koren, 2023). However, despite the adaptivity incorporated into these methods, auxiliary learning rate schedules are often still required to actually attain their optimal performance (e.g., Loshchilov & Hutter, 2016), and the nuisance of laborious and extensive manual tuning still remain relevant for these methods as well.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2311.13877

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.78)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Improving Training Stability for Multitask Ranking Models in Recommender Systems

Tang, Jiaxi, Drori, Yoel, Chang, Daryl, Sathiamoorthy, Maheswaran, Gilmer, Justin, Wei, Li, Yi, Xinyang, Hong, Lichan, Chi, Ed H.

arXiv.org Artificial IntelligenceJun-15-2023

Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss divergence, which can make the model unusable, waste significant resources and block model developments. In this paper, we share our findings and best practices we learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations. We show some properties of the model that lead to unstable training and conjecture on the causes. Furthermore, based on our observations of training dynamics near the point of training instability, we hypothesize why existing solutions would fail, and propose a new algorithm to mitigate the limitations of existing solutions. Our experiments on YouTube production dataset show the proposed algorithm can significantly improve training stability while not compromising convergence, comparing with several commonly used baseline methods.

artificial intelligence, clippy, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599846

2302.09178

Country: North America > United States > California (0.30)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.86)

Add feedback

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

Drori, Yoel, Shamir, Ohad

arXiv.org Machine LearningOct-4-2019

We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions. We provide several results, implying that the classical $\mathcal{O}(\epsilon^{-4})$ upper bound (for making the average gradient norm less than $\epsilon$) cannot be improved upon, unless a combination of additional assumptions is made. Notably, this holds even if we limit ourselves to convex quadratic functions. We also show that for nonconvex functions, the feasibility of minimizing gradients with SGD is surprisingly sensitive to the choice of optimality criteria.

artificial intelligence, machine learning, null 2, (17 more...)

arXiv.org Machine Learning

1910.01845

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback