Goto

Collaborating Authors

Results


Is California's supervolcano set to blow? Scientists identify more than 2,000 quakes at the Long Valley Caldera that they say 'are precursors for an eruption'

Daily Mail - Science & tech

California's supervolcano that has the power to bury Los Angeles in more than 3,000 feet of ash is showing signs of activity. Scientists at the California Institute of Technology (Caltech) identified over 2,000 earthquakes rumbling throughout the Long Valley Caldera in recent years. The team conducted a new investigation to see if the seismic activity was a sign of impending doom or that the risk of a massive eruption was decreasing. Caltech researchers created detailed underground images of the caldera, finding that the recent seismic activity results from fluids and gases released as the area cools off and settles down. The study author Zhongwen Zhan said: 'We don't think the region is gearing up for another supervolcanic eruption, but the cooling process may release enough gas and liquid to cause earthquakes and small eruptions. 'For example, in May 1980, there were four magnitude 6 earthquakes in the region alone.'


AI could prove energy hog that uses more electricity per year than some small countries: study

FOX News

A new study warned that artificial intelligence technology could cause a significant surge in electricity consumption. The paper, published in the journal Joule, details the potential future energy output of AI systems, noting that generative AI technology relies on powerful servers and that increased use could drive a spike in demand for energy. The authors point to tech giant Google in one such example, noting that AI only accounted for 10-15% of the company's total electricity consumption in 2021. But as AI technology continues to expand, Google's energy consumption could start to be on the scale of a small country. "The worst-case scenario suggests Google's AI alone could consume as much electricity as a country such as Ireland (29.3 TWh per year), which is a significant increase compared to its historical AI-related energy consumption," the authors wrote.


Eigen-Distortions of Hierarchical Representations Alexander Berardino Johannes Ballé Valero Laparra Center for Neural Science Center for Neural Science Image Processing Laboratory New York University

Neural Information Processing Systems

We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most-and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human perceptual sensitivity.


Reviews: Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning

Neural Information Processing Systems

This paper proposes Tensorized LSTMs for efficient sequence learning. It represents hidden layers as tensors, and employs cross-layer memory cell convolution for efficiency and effectiveness. The model is clearly formulated. Experimental results show the utility of the proposed method. Although the paper is well written, I still have some questions/confusion as follows.


Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Neural Information Processing Systems

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems -- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.


Found Graph Data and Planted Vertex Covers Jon Kleinberg Cornell University

Neural Information Processing Systems

A typical way in which network data is recorded is to measure all interactions involving a specified set of core nodes, which produces a graph containing this core together with a potentially larger set of fringe nodes that link to the core. Interactions between nodes in the fringe, however, are not present in the resulting graph data. For example, a phone service provider may only record calls in which at least one of the participants is a customer; this can include calls between a customer and a non-customer, but not between pairs of non-customers. Knowledge of which nodes belong to the core is crucial for interpreting the dataset, but this metadata is unavailable in many cases, either because it has been lost due to difficulties in data provenance, or because the network consists of "found data" obtained in settings such as counter-surveillance. This leads to an algorithmic problem of recovering the core set. Since the core is a vertex cover, we essentially have a planted vertex cover problem, but with an arbitrary underlying graph. We develop a framework for analyzing this planted vertex cover problem, based on the theory of fixed-parameter tractability, together with algorithms for recovering the core. Our algorithms are fast, simple to implement, and out-perform several baselines based on core-periphery structure on various real-world datasets.


Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

Neural Information Processing Systems

The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has proved critical, or in architectures more structured than simple fullyconnected networks. Here we present results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance. We present results on the MNIST, CIFAR-10, and ImageNet datasets, explore variants of target-propagation (TP) and feedback alignment (FA) algorithms, and examine performance in both fully-and locally-connected architectures. We also introduce weight-transport-free variants of difference target propagation (DTP) modified to remove backpropagation from the penultimate layer. Many of these algorithms perform well for MNIST, but for CIFAR and ImageNet we find that TP and FA variants perform significantly worse than BP, especially for networks composed of locally connected units, opening questions about whether new architectures and algorithms are required to scale these approaches. Our results and implementation details help establish baselines for biologically motivated deep learning schemes going forward.


The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies David Jacobs

Neural Information Processing Systems

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies. Our results lead to specific predictions of the time it will take a network to learn functions of varying frequency. These predictions match the empirical behavior of both shallow and deep networks.


MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing Systems

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches.


MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing Systems

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches.