Goto

Collaborating Authors

 shortcoming


Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes

Neural Information Processing Systems

Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when the feasible set is a polytope, and the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: i) large memory requirement due to the need to store an explicit convex decomposition of the current iterate, and as a consequence, large running-time overhead per iteration ii) the worst case convergence rate depends unfavorably on the dimension In this work we present a new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings. In particular, both memory and computation overheads are only linear in the dimension, and in addition, in case the optimal solution is sparse, the new convergence rate replaces a factor which is at least linear in the dimension in previous works, with a linear dependence on the number of non-zeros in the optimal solution At the heart of our method, and corresponding analysis, is a novel way to compute decomposition-invariant away-steps. While our theoretical guarantees do not apply to any polytope, they apply to several important structured polytopes that capture central concepts such as paths in graphs, perfect matchings in bipartite graphs, marginal distributions that arise in structured prediction tasks, and more. Our theoretical findings are complemented by empirical evidence that shows that our method delivers state-of-the-art performance.




Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Neural Information Processing Systems

Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75\% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.



24368c745de15b3d2d6279667debcba3-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their helpful comments. We first provide individual responses to each reviewer's comments For example, one could easily apply this method to the last row of a neural network. We will make the suggested changes to improve the writing. We will make this reference in the text of the paper. The reviewer correctly points out that [19] doesn't estimate individual densities but directly estimates the weight.


Ernest Shackleton knew 'Endurance' had shortcomings, new study says

Popular Science

Ernest Shackleton knew'Endurance' had shortcomings, new study says Issues with the ship's hull, deck beams, and more show the ship was no match for Antarctic sea ice. The'Endurance' leaning to one side, during the Imperial Trans-Antarctic Expedition, 1914-17, led by Sir Ernest Shackleton. Breakthroughs, discoveries, and DIY tips sent every weekday. For almost 110 years, the has rested at the bottom of the icy waters of the Antarctic's Weddell Sea . Long held as the poster ship for Antarctic exploration, Sir Ernest Shackleton's ill-fated ship was no match for the crushing sea ice that sank it in November 1915 .


24368c745de15b3d2d6279667debcba3-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their helpful comments. We first provide individual responses to each reviewer's comments For example, one could easily apply this method to the last row of a neural network. We will make the suggested changes to improve the writing. We will make this reference in the text of the paper. The reviewer correctly points out that [19] doesn't estimate individual densities but directly estimates the weight.



Reviews: MAVEN: Multi-Agent Variational Exploration

Neural Information Processing Systems

The Starcraft results also seem fine, but not so strong as it make it obvious that committed exploration is a crucial empirical improvement for QMIX - while MAVEN agents learn faster in 3s5z, the final performance looks the same; MAVEN agents seem to have less variability in final win rate on 5m_vs_6m; and QMIX actually seems to have better final performance on 10m_vs_11m. The results in figure 2 and 4 do however suggest that there may be scenarios where the advantage of MAVEN is higher. Minor comments: 1) line 64 and others: the subscript "qmix" should probably be wrapped in a "\text{}" 2) first eqn in section 3: inconsistency between using subscripts and superscripts, i.e. u_i and u i 3) line 81: perhaps better phrased as: "the *best* action of agent i..." 4) line 86: u_n i - u_ U i? 5) line 87: I was confused by what "the set of all possible such orderings over the action-values" means. Besides a degeneracy when some of the Q values are identical, isn't there only one valid ordering? Or are you just trying to cover that degeneracy? 6) Definition 1: perhaps add an intuitive explanation, e.g. "Intuitively, a Q-function is non-monotonic if the ordering of best actions for agent i can be affected by the other agents action choices at that time step."