Goto

Collaborating Authors

 bandit sampling


Appendix to " Adam with Bandit Sampling for Deep Learning "

Neural Information Processing Systems

According to Theorem 4. 1 in [1], the convergence rate of Adam is We prove Lemma 1 using the framework of online learning with bandit feedback. Let's consider a special case where It follows simply by plugging Lemma 3 into Theorem 2. In the main paper, we compared our method with Adam and Adam with importance sampling. In the main paper, we have shown the plots of loss value vs. wall clock time. Here, we include some plots of error rate vs. wall


Adam with Bandit Sampling for Deep Learning

Neural Information Processing Systems

Adam is a widely used optimization method for training deep learning models. It computes individual adaptive learning rates for different parameters. In this paper, we propose a generalization of Adam, called Adambs, that allows us to also adapt to different training examples based on their importance in the model's convergence. To achieve this, we maintain a distribution over all examples, selecting a mini-batch in each iteration by sampling according to this distribution, which we update using a multi-armed bandit algorithm. This ensures that examples that are more beneficial to the model training are sampled with higher probabilities.



Review for NeurIPS paper: Adam with Bandit Sampling for Deep Learning

Neural Information Processing Systems

Additional Feedback: This work seems to propose an approach for sampling minibatches that can perhaps be applied to other procedures apart from ADAM. Therefore, apart from ADAM, was this approach (or suitable variants) explored (perhaps empirically) for other optimiztions procedures that involve minibatch? It can also be used to produce desired minibatches for better training. How does this approach compare to the state of the art in curriculum learning. In Algorithm 2 Line 2, what is L?


Review for NeurIPS paper: Adam with Bandit Sampling for Deep Learning

Neural Information Processing Systems

Authors propose a method for adaptive selection of data points for SGD. Specifically, authors use the ADAM method and extend it to adaptive sampling setting using multi-armed bandit. Proposed method is further analyzed and improvement in the convergence speed is quantified. Extensive empirical results also support the proposed method. All reviewers unanimously recommend accept.


Adam with Bandit Sampling for Deep Learning

Neural Information Processing Systems

Adam is a widely used optimization method for training deep learning models. It computes individual adaptive learning rates for different parameters. In this paper, we propose a generalization of Adam, called Adambs, that allows us to also adapt to different training examples based on their importance in the model's convergence. To achieve this, we maintain a distribution over all examples, selecting a mini-batch in each iteration by sampling according to this distribution, which we update using a multi-armed bandit algorithm. This ensures that examples that are more beneficial to the model training are sampled with higher probabilities. Experiments on various models and datasets demonstrate Adambs's fast convergence in practice.


Reviews: Coordinate Descent with Bandit Sampling

Neural Information Processing Systems

The paper introduce a coordinate descent algorithm with an adaptive sampling à la Gauss-Southwell. Based on a descent lemma that quantifies the decay of the objective function when a coordinate is selected, the authors propose the "max_r" strategy that iteratively choose the coordinate that yields to the largest decrease. The paper follows recent developments on coordinate descent notably (Csiba et al 2015), (Nutini et al 2015), (Perekrestenko et al 2017) with an improved convergence bounds. As for previous adaptive sampling, the proposed method require a computational complexity equivalent to a full gradient descent which can be prohibitive in large scale optimization problem. To overcome this issue, the authors propose to learn the best coordinate by approximating the "max_r" strategy.


Google & J.P. Morgan Propose Advanced Bandit Sampling for Multiplex Networks

#artificialintelligence

Graph neural networks (GNNs) have gained popularity in the AI research community due to their impressive performance in high-impact applications such as drug discovery and social network analyses. Most existing studies on GNNs however have focused on "monoplex" settings (networks with only a single type of connection between entities) and not on multiplex settings (multiple types of connections between entities), which reflect many real-world scenarios. In the new paper Bandit Sampling for Multiplex Networks, a team from Google Research and J.P. Morgan AI Research explores the problem of computationally efficient link prediction in the multiplex setting, introducing an algorithm for scalable learning on multiplex networks with a large number of layers. In evaluations, the proposed method is shown to improve efficiency over prior work such as Multiplex Network Embedding (MNE, Zhang et al., 2018) and the DEEPLEX layer-sampling approach (Potluru et al., 2020). The multiplex network problem can be considered as a graph with many layers, where each layer has nodes neighbouring other layers.


Adam with Bandit Sampling for Deep Learning

Liu, Rui, Wu, Tianyi, Mozafari, Barzan

arXiv.org Machine Learning

Adam is a widely used optimization method for training deep learning models. It computes individual adaptive learning rates for different parameters. In this paper, we propose a generalization of Adam, called Adambs, that allows us to also adapt to different training examples based on their importance in the model's convergence. To achieve this, we maintain a distribution over all examples, selecting a mini-batch in each iteration by sampling according to this distribution, which we update using a multi-armed bandit algorithm. This ensures that examples that are more beneficial to the model training are sampled with higher probabilities. We theoretically show that Adambs improves the convergence rate of Adam---$O(\sqrt{\frac{\log n}{T} })$ instead of $O(\sqrt{\frac{n}{T}})$ in some cases. Experiments on various models and datasets demonstrate Adambs's fast convergence in practice.