Goto

Collaborating Authors

neurip


Questions for Flat-Minima Optimization of Modern Neural Networks

arXiv.org Machine Learning

For training neural networks, flat-minima optimizers that seek to find parameters in neighborhoods having uniformly low loss (flat minima) have been shown to improve upon stochastic and adaptive gradient-based methods. Two methods for finding flat minima stand out: 1. Averaging methods (i.e., Stochastic Weight Averaging, SWA), and 2. Minimax methods (i.e., Sharpness Aware Minimization, SAM). However, despite similar motivations, there has been limited investigation into their properties and no comprehensive comparison between them. In this work, we investigate the loss surfaces from a systematic benchmarking of these approaches across computer vision, natural language processing, and graph learning tasks. The results lead to a simple hypothesis: since both approaches find different flat solutions, combining them should improve generalization even further. We verify this improves over either flat-minima approach in 39 out of 42 cases. When it does not, we investigate potential reasons. We hope our results across image, graph, and text data will help researchers to improve deep learning optimizers, and practitioners to pinpoint the optimizer for the problem at hand.


Why Top Machine Learning Conferences Should Promote Art & Creativity

#artificialintelligence

One cannot, in all their seriousness, comprehend what went into writing the "Requiem for a dream" or painting the frescoes on the ceiling of the Sistine Chapel. But, what happens when this intelligence is augmented with an external entity, an algorithm? Artificial intelligence has intruded into the space of creativity, the final frontier of the human intellect, through algorithms such as Generative Adversarial Networks (GANs). GANs have become fertile tools for artistic exploration. Artists such as Refik Anadol, Robbie Barrat, Sofia Crespo, Mario Klingemann, Jason Salavon, Helena Sarin, and Mike Tyka generate fascinating imagery with models learned from natural imagery.


Postdoc in ML for COVID-19 @ HMS

#artificialintelligence

Prof. Hima Lakkaraju and Prof. Marinka Zitnik invite applications for a Postdoctoral Research Fellowship position at Harvard University starting in the Summer or Fall of 2020. The selected candidate will be expected to lead research in novel machine learning methods to combat COVID-19. More specifically, this fellowship will focus on leveraging recent advances in explainable and interpretable AI/ML to help with the diagnosis and treatment of COVID-19. For instance, the candidate will be developing explainable methods which not only facilitate early detection of COVID-19 as well as its spread across various communities, but also provide interpretable insights into these aspects. In addition, the candidate will also devise novel explainable algorithms that can detect and filter out misinformation about COVID-19.


The Global AI Talent Tracker - MacroPolo

#artificialintelligence

Countries, companies, and institutions around the world are mobilizing to apply the power of artificial intelligence (AI) to an enormous range of economic and social problems. That application requires bringing together several key inputs: research and engineering talent, data, computational power, and a healthy innovation ecosystem. Talent is one of the most important--and the most clearly quantifiable--of those inputs. To assess the global balance and flow of top AI scientists we focused on what many consider the top AI conference for deep learning: Neural Information Processing Systems, a.k.a. For its December 2019 conference, NeurIPS saw a record-breaking 15,920 researchers submit 6,614 papers, with a paper acceptance rate of 21.6%, making it one of the largest, most popular, and most selective AI conferences on record.


How To Write A Top ML Paper: A Checklist From NeurIPS

#artificialintelligence

Thousands of machine learning papers get published every week. It is almost impossible to find the most useful paper in this vast and growing list. A paper typically gets credit when it finds a real-world application, or is applauded by top researchers in the community, or even if it gets accepted in prestigious AI conferences, such as NeurIPS, ICML, ICLR etc. Usually, these conferences act as platforms to promote research. The acceptance guidelines for these top conferences vary, but they all are stringent nevertheless. The reviewers who skim through papers have thumb rules, such as the availability of code, replicability of results, etc. to judge a paper.


Meta-Learning in Neural Networks: A Survey

arXiv.org Machine Learning

The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where a given task is solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many of the conventional challenges of deep learning, including data and computation bottlenecks, as well as the fundamental issue of generalization. In this survey we describe the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields, such as transfer learning, multi-task learning, and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning including few-shot learning, reinforcement learning and architecture search. Finally, we discuss outstanding challenges and promising areas for future research.


African AI Experts Get Excluded From a Conference--Again

#artificialintelligence

At the G7 meeting in Montreal last year, Justin Trudeau told WIRED he would look into why more than 100 African artificial intelligence researchers had been barred from visiting that city to attend their field's most important annual event, the Neural Information Processing Systems conference, or NeurIPS. Now the same thing has happened again. More than a dozen AI researchers from African countries have been refused visas to attend this year's NeurIPS, to be held next month in Vancouver. This means an event that shapes the course of a technology with huge economic and social importance will have little input from a major portion of the world. The conference brings together thousands of researchers from top academic institutions and companies, for hundreds of talks, workshops, and side meetings at which new ideas and theories are hashed out.


Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data Modelling

arXiv.org Machine Learning

The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational data. The proposed Recurrent-DBN has the following merits: (1) it infers interpretable and organised hierarchical latent structures for objects within and across time steps; (2) it enables recurrent long-term temporal dependence modelling, which outperforms the one-order Markov descriptions in most of the dynamic probabilistic frameworks. In addition, we develop a new inference strategy, which first upward-and-backward propagates latent counts and then downward-and-forward samples variables, to enable efficient Gibbs sampling for the Recurrent-DBN. We apply the Recurrent-DBN to dynamic relational data problems. The extensive experiment results on real-world data validate the advantages of the Recurrent-DBN over the state-of-the-art models in interpretable latent structure discovery and improved link prediction performance.


Do We Need Zero Training Loss After Achieving Zero Training Error?

arXiv.org Machine Learning

Overparameterized deep networks have the capacity to memorize training data with zero training error. Even after memorization, the training loss continues to approach zero, making the model overconfident and the test performance degraded. Since existing regularizers do not directly aim to avoid zero training loss, they often fail to maintain a moderate level of training loss, ending up with a too small or too large loss. We propose a direct solution called flooding that intentionally prevents further reduction of the training loss when it reaches a reasonably small value, which we call the flooding level. Our approach makes the loss float around the flooding level by doing mini-batched gradient descent as usual but gradient ascent if the training loss is below the flooding level. This can be implemented with one line of code, and is compatible with any stochastic optimizer and other regularizers. With flooding, the model will continue to "random walk" with the same non-zero training loss, and we expect it to drift into an area with a flat loss landscape that leads to better generalization. We experimentally show that flooding improves performance and as a byproduct, induces a double descent curve of the test loss.


Boosting Adversarial Training with Hypersphere Embedding

arXiv.org Machine Learning

Adversarial training (AT) is one of the most effective defenses to improve the adversarial robustness of deep learning models. In order to promote the reliability of the adversarially trained models, we propose to boost AT via incorporating hypersphere embedding (HE), which can regularize the adversarial features onto compact hypersphere manifolds. We formally demonstrate that AT and HE are well coupled, which tunes up the learning dynamics of AT from several aspects. We comprehensively validate the effectiveness and universality of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In experiments, we evaluate our methods on the CIFAR-10 and ImageNet datasets, and verify that integrating HE can consistently enhance the performance of the models trained by each AT framework with little extra computation.