Not enough data to create a plot.
Try a different view from the menu above.
A Proofs 438 We first redefine notation for clarity and then provide the proofs of the results in the main paper
We first redefine notation for clarity and then provide the proofs of the results in the main paper. Now we first prove that the iteration in Eq.2 has a fixed point. Proof of Lemma 3.1: Let We present the bound on using empirical Bellman operator compared to the true Bellman operator. The proof can be found in [6]. Proof of Theorem 3.4: Recall that the expression of the V -function iterate is given by: Proof of Theorem 3.6: The proof of this statement is divided into two parts.
- North America > Canada > Quebec > Montreal (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > Tennessee > Knox County > Knoxville (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Leisure & Entertainment (1.00)
- Media > Film (0.89)
- Banking & Finance > Trading (0.55)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > e-Commerce > Financial Technology (0.91)
- Information Technology > Communications > Mobile (0.56)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
incorporate feedback into our final revision. 4 [R1]: " I don't exactly see if small batch vs large batch captures this phenomenon; if yes should say explicitly. "
We thank the reviewers for the detailed and insightful reviews. As the reviews noted, our work 1) introduces "novel Smith et al. [2017] make an explicit connection between small vs. large batch "A small discussion on if the phenomenon has been observed for different datasets/tasks with different optimizers" The phenomenon may not be true for other optimizers such as Adam, though. "concept of "memorizable and generalizable", though intuitive, is sketchy and not formally explained ... authors We acknowledge that the terms "memorizable" and "generalizable" are potentially confusing. We will revise our terminology to clarify this distinction. By "inherently noisy", we refer to the fact that high noise in the datapoints will necessitate larger sample complexity.