Energy
The State of AI Ethics Report (June 2020)
Gupta, Abhishek, Lanteigne, Camylle, Heath, Victoria, Ganapini, Marianna Bergamaschi, Galinkin, Erick, Cohen, Allison, De Gasperis, Tania, Akif, Mo, Butalid, Renjie
These past few months have been especially challenging, and the deployment of technology in ways hitherto untested at an unrivalled pace has left the internet and technology watchers aghast. Artificial intelligence has become the byword for technological progress and is being used in everything from helping us combat the COVID-19 pandemic to nudging our attention in different directions as we all spend increasingly larger amounts of time online. It has never been more important that we keep a sharp eye out on the development of this field and how it is shaping our society and interactions with each other. With this inaugural edition of the State of AI Ethics we hope to bring forward the most important developments that caught our attention at the Montreal AI Ethics Institute this past quarter. Our goal is to help you navigate this ever-evolving field swiftly and allow you and your organization to make informed decisions. This pulse-check for the state of discourse, research, and development is geared towards researchers and practitioners alike who are making decisions on behalf of their organizations in considering the societal impacts of AI-enabled solutions. We cover a wide set of areas in this report spanning Agency and Responsibility, Security and Risk, Disinformation, Jobs and Labor, the Future of AI Ethics, and more. Our staff has worked tirelessly over the past quarter surfacing signal from the noise so that you are equipped with the right tools and knowledge to confidently tread this complex yet consequential domain.
A mechanism to promote social behaviour in household load balancing
Brooks, Nathan A., Powers, Simon T., Borg, James M.
Reducing the peak energy consumption of households is essential for the effective use of renewable energy sources, in order to ensure that as much household demand as possible can be met by renewable sources. This entails spreading out the use of high-powered appliances such as dishwashers and washing machines throughout the day. Traditional approaches to this problem have relied on differential pricing set by a centralised utility company. But this mechanism has not been effective in promoting widespread shifting of appliance usage. Here we consider an alternative decentralised mechanism, where agents receive an initial allocation of time-slots to use their appliances and can then exchange these with other agents. If agents are willing to be more flexible in the exchanges they accept, then overall satisfaction, in terms of the percentage of agents time-slot preferences that are satisfied, will increase. This requires a mechanism that can incentivise agents to be more flexible. Building on previous work, we show that a mechanism incorporating social capital - the tracking of favours given and received - can incentivise agents to act flexibly and give favours by accepting exchanges that do not immediately benefit them. We demonstrate that a mechanism that tracks favours increases the overall satisfaction of agents, and crucially allows social agents that give favours to outcompete selfish agents that do not under payoff-biased social learning. Thus, even completely self-interested agents are expected to learn to produce socially beneficial outcomes.
Extracting the main trend in a dataset: the Sequencer algorithm
Scientists aim to extract simplicity from observations of the complex world. An important component of this process is the exploration of data in search of trends. In practice, however, this tends to be more of an art than a science. Among all trends existing in the natural world, one-dimensional trends, often called sequences, are of particular interest as they provide insights into simple phenomena. However, some are challenging to detect as they may be expressed in complex manners. We present the Sequencer, an algorithm designed to generically identify the main trend in a dataset. It does so by constructing graphs describing the similarities between pairs of observations, computed with a set of metrics and scales. Using the fact that continuous trends lead to more elongated graphs, the algorithm can identify which aspects of the data are relevant in establishing a global sequence. Such an approach can be used beyond the proposed algorithm and can optimize the parameters of any dimensionality reduction technique. We demonstrate the power of the Sequencer using real-world data from astronomy, geology as well as images from the natural world. We show that, in a number of cases, it outperforms the popular t-SNE and UMAP dimensionality reduction techniques. This approach to exploratory data analysis, which does not rely on training nor tuning of any parameter, has the potential to enable discoveries in a wide range of scientific domains. The source code is available on github and we provide an online interface at \url{http://sequencer.org}.
Off-the-grid: Fast and Effective Hyperparameter Search for Kernel Clustering
Ordozgoiti, Bruno, Muñoz, Lluís A. Belanche
Kernel functions are a powerful tool to enhance the $k$-means clustering algorithm via the kernel trick. It is known that the parameters of the chosen kernel function can have a dramatic impact on the result. In supervised settings, these can be tuned via cross-validation, but for clustering this is not straightforward and heuristics are usually employed. In this paper we study the impact of kernel parameters on kernel $k$-means. In particular, we derive a lower bound, tight up to constant factors, below which the parameter of the RBF kernel will render kernel $k$-means meaningless. We argue that grid search can be ineffective for hyperparameter search in this context and propose an alternative algorithm for this purpose. In addition, we offer an efficient implementation based on fast approximate exponentiation with provable quality guarantees. Our experimental results demonstrate the ability of our method to efficiently reveal a rich and useful set of hyperparameter values.
Extension of Direct Feedback Alignment to Convolutional and Recurrent Neural Network for Bio-plausible Deep Learning
Han, Donghyeon, Park, Gwangtae, Ryu, Junha, Yoo, Hoi-jun
Throughout this paper, we focus on the improvement of the direct feedback alignment (DFA) algorithm and extend the usage of the DFA to convolutional and recurrent neural networks (CNNs and RNNs). Even though the DFA algorithm is biologically plausible and has a potential of high-speed training, it has not been considered as the substitute for back-propagation (BP) due to the low accuracy in the CNN and RNN training. In this work, we propose a new DFA algorithm for BP-level accurate CNN and RNN training. Firstly, we divide the network into several modules and apply the DFA algorithm within the module. Second, the DFA with the sparse backward weight is applied. It comes with a form of dilated convolution in the CNN case, and in a form of sparse matrix multiplication in the RNN case. Additionally, the error propagation method of CNN becomes simpler through the group convolution. Finally, hybrid DFA increases the accuracy of the CNN and RNN training to the BP-level while taking advantage of the parallelism and hardware efficiency of the DFA algorithm.
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
Launay, Julien, Poli, Iacopo, Boniface, François, Krzakala, Florent
Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment (DFA) to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to fine-tuned backpropagation. When a larger gap between DFA and backpropagation exists, like in Transformers, we attribute this to a need to rethink common practices for large and complex architectures. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport.
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
Pertsch, Karl, Rybkin, Oleh, Ebert, Frederik, Finn, Chelsea, Jayaraman, Dinesh, Levine, Sergey
The ability to predict and plan into the future is fundamental for agents acting in the world. To reach a faraway goal, we predict trajectories at multiple timescales, first devising a coarse plan towards the goal and then gradually filling in details. In contrast, current learning approaches for visual prediction and planning fail on long-horizon tasks as they generate predictions (1) without considering goal information, and (2) at the finest temporal resolution, one step at a time. In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations. First, we formulate the problem of predicting towards a goal and propose the corresponding class of latent space goal-conditioned predictors (GCPs). GCPs significantly improve planning efficiency by constraining the search space to only those trajectories that reach the goal. Further, we show how GCPs can be naturally formulated as hierarchical models that, given two observations, predict an observation between them, and by recursively subdividing each part of the trajectory generate complete sequences. This divide-and-conquer strategy is effective at long-term prediction, and enables us to design an effective hierarchical planning algorithm that optimizes trajectories in a coarse-to-fine manner. We show that by using both goal-conditioning and hierarchical prediction, GCPs enable us to solve visual planning tasks with much longer horizon than previously possible.
Principal Component Networks: Parameter Reduction Early in Training
Waleffe, Roger, Rekatsinas, Theodoros
Recent works show that overparameterized networks contain small subnetworks that exhibit comparable accuracy to the full model when trained in isolation. These results highlight the potential to reduce training costs of deep neural networks without sacrificing generalization performance. However, existing approaches for finding these small networks rely on expensive multi-round train-and-prune procedures and are non-practical for large data sets and models. In this paper, we show how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs. We find that hidden layer activations in overparameterized networks exist primarily in subspaces smaller than the actual model width. Building on this observation, we use PCA to find a basis of high variance for layer inputs and represent layer weights using these directions. We eliminate all weights not relevant to the found PCA basis and term these network architectures Principal Component Networks. On CIFAR-10 and ImageNet, we show that PCNs train faster and use less energy than overparameterized models, without accuracy loss. We find that our transformation leads to networks with up to 23.8x fewer parameters, with equal or higher end-model accuracy---in some cases we observe improvements up to 3%. We also show that ResNet-20 PCNs outperform deep ResNet-110 networks while training faster.
A General Class of Transfer Learning Regression without Implementation Cost
Minami, Shunya, Liu, Song, Wu, Stephen, Fukumizu, Kenji, Yoshida, Ryo
We propose a novel framework that unifies and extends existing methods of transfer learning (TL) for regression. To bridge a pretrained source model to the model on a target task, we introduce a density-ratio reweighting function, which is estimated through the Bayesian framework with a specific prior distribution. By changing two intrinsic hyperparameters and the choice of the density-ratio model, the proposed method can integrate three popular methods of TL: TL based on cross-domain similarity regularization, a probabilistic TL using the density-ratio estimation, and fine-tuning of pretrained neural networks. Moreover, the proposed method can benefit from its simple implementation without any additional cost; the model can be fully trained using off-the-shelf libraries for supervised learning in which the original output variable is simply transformed to a new output. We demonstrate its simplicity, generality, and applicability using various real data applications.
Convolutional-network models to predict wall-bounded turbulence from wall quantities
Guastoni, L., Güemes, A., Ianiro, A., Discetti, S., Schlatter, P., Azizpour, H., Vinuesa, R.
Two models based on convolutional neural networks are trained to predict the two-dimensional velocity-fluctuation fields at different wall-normal locations in a turbulent open channel flow, using the wall-shear-stress components and the wall pressure as inputs. The first model is a fully-convolutional neural network (FCN) which directly predicts the fluctuations, while the second one reconstructs the flow fields using a linear combination of orthonormal basis functions, obtained through proper orthogonal decomposition (POD), hence named FCN-POD. Both models are trained using data from two direct numerical simulations (DNS) at friction Reynolds numbers $Re_{\tau} = 180$ and $550$. Thanks to their ability to predict the nonlinear interactions in the flow, both models show a better prediction performance than the extended proper orthogonal decomposition (EPOD), which establishes a linear relation between input and output fields. The performance of the various models is compared based on predictions of the instantaneous fluctuation fields, turbulence statistics and power-spectral densities. The FCN exhibits the best predictions closer to the wall, whereas the FCN-POD model provides better predictions at larger wall-normal distances. We also assessed the feasibility of performing transfer learning for the FCN model, using the weights from $Re_{\tau}=180$ to initialize those of the $Re_{\tau}=550$ case. Our results indicate that it is possible to obtain a performance similar to that of the reference model up to $y^{+}=50$, with $50\%$ and $25\%$ of the original training data. These non-intrusive sensing models will play an important role in applications related to closed-loop control of wall-bounded turbulence.