Plotting


Hyperparameter Tuning is All You Need for LISTA Xiaohan Chen Zhangyang Wang 1 Wotao Yin

Neural Information Processing Systems

Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. It has had great success on sparse recovery. In this paper, we show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate and, in particular, the network with instance-optimal parameters is superlinearly convergent. Moreover, our new theoretical results lead to a practical approach of automatically and adaptively calculating the parameters of a LISTA network layer based on its previous layers. Perhaps most surprisingly, such an adaptive-parameter procedure reduces the training of LISTA to tuning only three hyperparameters from data: a new record set in the context of the recent advances on trimming down LISTA complexity. We call this new ultra-light weight network HyperLISTA. Compared to state-of-the-art LISTA models, HyperLISTA achieves almost the same performance on seen data distributions and performs better when tested on unseen distributions (specifically, those with different sparsity levels and nonzero magnitudes).


DeepGEM: Generalized Expectation-Maximization for Blind Inversion 1 Jorge C. Castellanos

Neural Information Processing Systems

M-Step only reconstructions with known sources.............. 9 2.5.3 Similar works that use expectation maximization (EM) based deep learning approaches are usually specific to a single task, often times image classification. Results shown are simulated using 20 surface receivers and a varying number of sources (9, 25, and 49) in a uniform grid. The velocity reconstruction MSE is included in the top right of each reconstruction. The model with the highest data likelihood is highlighted in orange.



Google made it clear at I/O that AI will soon be inescapable

ZDNet

Unsurprisingly, the bulk of Google's announcements at I/O this week focused on AI. Although past Google I/O events also heavily leaned on AI, what made this year's announcements different is that the features were spread across nearly every Google offering and touched nearly every task people partake in every day. Because I'm an AI optimist, and my job as an AI editor involves testing tools, I have always been pretty open to using AI to optimize my daily tasks. However, Google's keynote made it clear that even those who may not be as open to it will soon find it unavoidable. Moreover, the tech giants' announcements shed light on the industry's future, revealing three major trends about where AI is headed, which you can read more about below.


Improved Feature Distillation via Projector Ensemble Yudong Chen 1 Sen Wang

Neural Information Processing Systems

In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effect of the feature projector between the student and the teacher remains underexplored. In this paper, we first discuss a plausible mechanism of the projector with empirical evidence and then propose a new feature distillation method based on a projector ensemble for further performance improvement. We observe that the student network benefits from a projector even if the feature dimensions of the student and the teacher are the same. Training a student backbone without a projector can be considered as a multi-task learning process, namely achieving discriminative feature extraction for classification and feature matching between the student and the teacher for distillation at the same time. We hypothesize and empirically verify that without a projector, the student network tends to overfit the teacher's feature distributions despite having different architecture and weights initialization.


75877cb75154206c4e65e76b88a12712-Paper.pdf

Neural Information Processing Systems

The ability to detect and count certain substructures in graphs is important for solving many tasks on graph-structured data, especially in the contexts of computational chemistry and biology as well as social network analysis. Inspired by this, we propose to study the expressive power of graph neural networks (GNNs) via their ability to count attributed graph substructures, extending recent works that examine their power in graph isomorphism testing and function approximation. We distinguish between two types of substructure counting: induced-subgraph-count and subgraph-count, and establish both positive and negative answers for popular GNN architectures. Specifically, we prove that Message Passing Neural Networks (MPNNs), 2-Weisfeiler-Lehman (2-WL) and 2-Invariant Graph Networks (2-IGNs) cannot perform induced-subgraph-count of any connected substructure consisting of 3 or more nodes, while they can perform subgraph-count of star-shaped substructures. As an intermediary step, we prove that 2-WL and 2-IGNs are equivalent in distinguishing non-isomorphic graphs, partly answering an open problem raised in [38]. We also prove positive results for k-WL and k-IGNs as well as negative results for k-WL with a finite number of iterations. We then conduct experiments that support the theoretical results for MPNNs and 2-IGNs. Moreover, motivated by substructure counting and inspired by [45], we propose the Local Relational Pooling model and demonstrate that it is not only effective for substructure counting but also able to achieve competitive performance on molecular prediction tasks.


Pretraining with Random Noise for Fast and Robust Learning without Weight Transport Sang Wan Lee 1,2,3 Se-Bum Paik

Neural Information Processing Systems

The brain prepares for learning even before interacting with the environment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However, the mechanism of such a process has yet to be understood, and it is unclear whether this process can benefit the algorithm of machine learning. Here, we study this issue using a neural network with a feedback alignment algorithm, demonstrating that pretraining neural networks with random noise increases the learning efficiency as well as generalization abilities without weight transport. First, we found that random noise training modifies forward weights to match backward synaptic feedback, which is necessary for teaching errors by feedback alignment. As a result, a network with pre-aligned weights learns notably faster and reaches higher accuracy than a network without random noise training, even comparable to the backpropagation algorithm.


747d3443e319a22747fbb873e8b2f9f2-Supplemental.pdf

Neural Information Processing Systems

A.1 Bayesian Optimization Based Search In this procedure, we build a model for the accuracy of unevaluated BSSC based on evaluated one. Gaussian Process (GP, [1]) is a good method to achieve this in Bayesian optimization literature [2]. When selecting the first BSSC, equation 2 can be used directly. Therefore, we use the expected value of EI function (EEI, [4]) instead. The value of equation 3 is calculated via Monte Carlo simulations [4] in our method.


AutoBSS: An Efficient Algorithm for Block Stacking Style Search

Neural Information Processing Systems

Neural network architecture design mostly focuses on the new convolutional operator or special topological structure of network block, little attention is drawn to the configuration of stacking each block, called Block Stacking Style (BSS). Recent studies show that BSS may also have an unneglectable impact on networks, thus we design an efficient algorithm to search it automatically. The proposed method, AutoBSS, is a novel AutoML algorithm based on Bayesian optimization by iteratively refining and clustering Block Stacking Style Coding (BSSC), which can find optimal BSS in a few trials without biased evaluation.