AITopics | McWilliams, Brian

Plotting

McWilliams, Brian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks

Kallweit, Simon, Müller, Thomas, McWilliams, Brian, Gross, Markus, Novák, Jan

arXiv.org Machine LearningSep-15-2017

We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integration or diffusion theory. We approach the problem differently. Instead of simulating all light transport during rendering, we pre-learn the spatial and directional distribution of radiant flux from tens of cloud exemplars. To render a new scene, we sample visible points of the cloud and, for each, extract a hierarchical 3D descriptor of the cloud geometry with respect to the shading location and the light source. The descriptor is input to a deep neural network that predicts the radiance function for each shading configuration. We make the key observation that progressively feeding the hierarchical descriptor into the network enhances the network's ability to learn faster and predict with high accuracy while using few coefficients. We also employ a block design with residual connections to further improve performance. A GPU implementation of our method synthesizes images of clouds that are nearly indistinguishable from the reference solution within seconds interactively. Our method thus represents a viable solution for applications such as cloud design and, thanks to its temporal stability, also for high-quality production of animated content.

cloud, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3130800.3130880

1709.05418

Country:

North America > United States (0.14)
Europe > Switzerland (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Information Technology > Security & Privacy (0.54)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Preserving Differential Privacy Between Features in Distributed Estimation

Heinze-Deml, Christina, McWilliams, Brian, Meinshausen, Nicolai

arXiv.org Machine LearningJun-27-2017

Privacy is crucial in many applications of machine learning. Legal, ethical and societal issues restrict the sharing of sensitive data making it difficult to learn from datasets that are partitioned between many parties. One important instance of such a distributed setting arises when information about each record in the dataset is held by different data owners (the design matrix is "vertically-partitioned"). In this setting few approaches exist for private data sharing for the purposes of statistical estimation and the classical setup of differential privacy with a "trusted curator" preparing the data does not apply. We work with the notion of $(\epsilon,\delta)$-distributed differential privacy which extends single-party differential privacy to the distributed, vertically-partitioned case. We propose PriDE, a scalable framework for distributed estimation where each party communicates perturbed random projections of their locally held features ensuring $(\epsilon,\delta)$-distributed differential privacy is preserved. For $\ell_2$-penalized supervised learning problems PriDE has bounded estimation error compared with the optimal estimates obtained without privacy constraints in the non-distributed setting. We confirm this empirically on real world and synthetic datasets.

health & medicine, oncology, privacy, (13 more...)

arXiv.org Machine Learning

1703.00403

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

Balduzzi, David, Frean, Marcus, Leary, Lennox, Lewis, JP, Ma, Kurt Wan-Duo, McWilliams, Brian

arXiv.org Machine LearningFeb-27-2017

A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. The problem has largely been overcome through the introduction of carefully constructed initializations and batch normalization. Nevertheless, architectures incorporating skip-connections such as resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise. In contrast, the gradients in architectures with skip-connections are far more resistant to shattering decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fully-connected networks and convnets. Finally, we present a new "looks linear" (LL) initialization that prevents shattering. Preliminary experiments show the new initialization allows to train very deep networks without the addition of skip-connections.

deep learning, gradient, neural network, (18 more...)

arXiv.org Machine Learning

1702.08591

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

Balduzzi, David, McWilliams, Brian, Butler-Yeoman, Tony

arXiv.org Machine LearningFeb-23-2017

Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex. Standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets. The guarantee matches a lower bound for convex nonsmooth functions. The key technical tool is the neural Taylor approximation -- a straightforward application of Taylor expansions to neural networks -- and the associated Taylor loss. Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. The second half of the paper applies the Taylor approximation to isolate the main difficulty in training rectifier nets: that gradients are shattered. We investigate the hypothesis that, by exploring the space of activation configurations more thoroughly, adaptive optimizers such as RMSProp and Adam are able to converge to better solutions.

deep learning, gradient, neural network, (17 more...)

arXiv.org Machine Learning

1611.02345

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Scalable Adaptive Stochastic Optimization Using Random Projections

Krummenacher, Gabriel, McWilliams, Brian, Kilcher, Yannic, Buhmann, Joachim M., Meinshausen, Nicolai

Neural Information Processing SystemsDec-31-2016

Adaptive stochastic gradient methods such as ADAGRAD have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of ADAGRAD is expected to attain better performance, however in high dimensions it is computationally impractical. We present ADA-LR and RADAGRAD two computationally efficient approximations to full-matrix ADAGRAD based on randomized dimensionality reduction. They are able to capture dependencies between features and achieve similar performance to full-matrix ADAGRAD but at a much smaller computational cost. We show that the regret of ADA-LR is close to the regret of full-matrix ADAGRAD which can have an up-to exponentially smaller dependence on the dimension than the diagonal variant. Empirically, we show that ADA-LR and RADAGRAD perform similarly to full-matrix ADAGRAD. On the task of training convolutional neural networks as well as recurrent neural networks, RADAGRAD achieves faster convergence than diagonal ADAGRAD.

adagrad, deep learning, neural network, (16 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scalable Adaptive Stochastic Optimization Using Random Projections

Krummenacher, Gabriel, McWilliams, Brian, Kilcher, Yannic, Buhmann, Joachim M., Meinshausen, Nicolai

arXiv.org Machine LearningNov-21-2016

Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain better performance, however in high dimensions it is computationally impractical. We present Ada-LR and RadaGrad two computationally efficient approximations to full-matrix AdaGrad based on randomized dimensionality reduction. They are able to capture dependencies between features and achieve similar performance to full-matrix AdaGrad but at a much smaller computational cost. We show that the regret of Ada-LR is close to the regret of full-matrix AdaGrad which can have an up-to exponentially smaller dependence on the dimension than the diagonal variant. Empirically, we show that Ada-LR and RadaGrad perform similarly to full-matrix AdaGrad. On the task of training convolutional neural networks as well as recurrent neural networks, RadaGrad achieves faster convergence than diagonal AdaGrad.

adagrad, deep learning, neural network, (13 more...)

arXiv.org Machine Learning

1611.06652

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variance Reduced Stochastic Gradient Descent with Neighbors

Hofmann, Thomas, Lucchi, Aurelien, Lacoste-Julien, Simon, McWilliams, Brian

arXiv.org Machine LearningFeb-26-2016

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1506.03662

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Heinze, Christina, McWilliams, Brian, Meinshausen, Nicolai

arXiv.org Machine LearningJan-8-2016

We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.

artificial intelligence, dual-loco, optimization problem, (18 more...)

arXiv.org Machine Learning

1506.02554

Country: Europe > Spain (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Variance Reduced Stochastic Gradient Descent with Neighbors

Hofmann, Thomas, Lucchi, Aurelien, Lacoste-Julien, Simon, McWilliams, Brian

Neural Information Processing SystemsDec-31-2015

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet it is also known to be slow relative to steepest descent. Recently, variance reduction techniques such as SVRG and SAGA have been proposed to overcome this weakness. With asymptotically vanishing variance, a constant step size can be maintained, resulting in geometric convergence rates. However, these methods are either based on occasional computations of full gradients at pivot points (SVRG), or on keeping per data point corrections in memory (SAGA). This has the disadvantage that one cannot employ these methods in a streaming setting and that speed-ups relative to SGD may need a certain number of epochs in order to materialize. This paper investigates a new class of algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points. While not meant to be offering advantages in an asymptotic setting, there are significant benefits in the transient optimization phase, in particular in a streaming or single-epoch setting. We investigate this family of algorithms in a thorough analysis and show supporting experimental results. As a side-product we provide a simple and unified proof technique for a broad class of variance reduction algorithms.

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

LOCO: Distributing Ridge Regression with Random Projections

Heinze, Christina, McWilliams, Brian, Meinshausen, Nicolai, Krummenacher, Gabriel

arXiv.org Machine LearningJun-8-2015

We propose LOCO, an algorithm for large-scale ridge regression which distributes the features across workers on a cluster. Important dependencies between variables are preserved using structured random projections which are cheap to compute and must only be communicated once. We show that LOCO obtains a solution which is close to the exact ridge regression solution in the fixed design setting. We verify this experimentally in a simulation study as well as an application to climate prediction. Furthermore, we show that LOCO achieves significant speedups compared with a state-of-the-art distributed algorithm on a large-scale regression problem.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1406.3469

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback