Romero, Adriana
Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver
Sagun, Levent, Gulcehre, Caglar, Romero, Adriana, Rostamzadeh, Negar, Mannelli, Stefano Sarao
Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019. As organizers of the workshop, we created the following report in an attempt to isolate emerging topics and recurring themes that have been presented throughout the event. Deep learning can still be a complex mix of art and engineering despite its tremendous success in recent years. The workshop aimed at gathering people across the board to address seemingly contrasting challenges in the problems they are working on. As part of the call for the workshop, particular attention has been given to the interdependence of architecture, data, and optimization that gives rise to an enormous landscape of design and performance intricacies that are not well-understood. This year, our goal was to emphasize the following directions in our community: (i) identify obstacles in the way to better models and algorithms; (ii) identify the general trends from which we would like to build scientific and potentially theoretical understanding; and (iii) the rigorous design of scientific experiments and experimental protocols whose purpose is to resolve and pinpoint the origin of mysteries while ensuring reproducibility and robustness of conclusions. In the event, these topics emerged and were broadly discussed, matching our expectations and paving the way for new studies in these directions. While we acknowledge that the text is naturally biased as it comes through our lens, here we present an attempt to do a fair job of highlighting the outcome of the workshop.
On the Evaluation of Conditional GANs
DeVries, Terrance, Romero, Adriana, Pineda, Luis, Taylor, Graham W., Drozdzal, Michal
Conditional Generative Adversarial Networks (cGANs) are finding increasingly widespread use in many application domains. Despite outstanding progress, quantitative evaluation of such models often involves multiple distinct metrics to assess different desirable properties such as image quality, intra-conditioning diversity, and conditional consistency, making model benchmarking challenging. In this paper, we propose the Frechet Joint Distance (FJD), which implicitly captures the above mentioned properties in a single metric. FJD is defined as the Frechet Distance of the joint distribution of images and conditionings, making it less sensitive to the often limited per-conditioning sample size. As a result, it scales more gracefully to stronger forms of conditioning such as pixel-wise or multi-modal conditioning. We evaluate FJD on a modified version of the dSprite dataset as well as on the large scale COCO-Stuff dataset, and consistently highlight its benefits when compared to currently established metrics. Moreover, we use the newly introduced metric to compare existing cGAN-based models, with varying conditioning strengths, and show that FJD can be used as a promising single metric for model benchmarking.
fastMRI: An Open Dataset and Benchmarks for Accelerated MRI
Zbontar, Jure, Knoll, Florian, Sriram, Anuroop, Muckley, Matthew J., Bruno, Mary, Defazio, Aaron, Parente, Marc, Geras, Krzysztof J., Katsnelson, Joe, Chandarana, Hersh, Zhang, Zizhao, Drozdzal, Michal, Romero, Adriana, Rabbat, Michael, Vincent, Pascal, Pinkerton, James, Wang, Duo, Yakubova, Nafissa, Owens, Erich, Zitnick, C. Lawrence, Recht, Michael P., Sodickson, Daniel K., Lui, Yvonne W.
The excellent soft tissue contrast and flexibility of magnetic resonance imaging (MRI) makes it a very powerful diagnostic tool for a wide range of disorders, including neurological, musculoskeletal, and oncological diseases. However, the long acquisition time in MRI, which can easily exceed 30 minutes, leads to low patient throughput, problems with patient comfort and compliance, artifacts from patient motion, and high exam costs. As a consequence, increasing imaging speed has been a major ongoing research goal since the advent of MRI in the 1970s. Increases in imaging speed have been achieved through both hardware developments (such as improved magnetic field gradients) and software advances (such as new pulse sequences). One noteworthy development in this context is parallel imaging, introduced in the 1990s, which allows multiple data points to be sampled simultaneously, rather than in a traditional sequential order [39, 26, 9].
Graph Attention Networks
Veličković, Petar, Cucurull, Guillem, Casanova, Arantxa, Romero, Adriana, Liò, Pietro, Bengio, Yoshua
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).
Diet Networks: Thin Parameters for Fat Genomics
Romero, Adriana, Carrier, Pierre Luc, Erraqabi, Akram, Sylvain, Tristan, Auvolat, Alex, Dejoie, Etienne, Legault, Marc-André, Dubé, Marie-Pierre, Hussin, Julie G., Bengio, Yoshua
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.