Overview
Embracing the age of artificial intelligence in the latest ISOfocus
Artificial intelligence (AI) is a game-changing technology that is affecting all our lives and shaping our future. In the latest ISOfocus issue, we debunk the AI myths, explore the opportunities and explain why globally relevant standards are key. Are killer robots about to take over the world? Mention artificial intelligence to the average person today and this is one of the many scary scenarios that spring to mind. Perhaps this is no surprise when you consider how AI is the technology that enables computers to think and act like human beings.
The frontier of simulation-based inference
Cranmer, Kyle, Brehmer, Johann, Louppe, Gilles
Many domains of science have developed complex simulations to describe phenomena of interest. While these simulations provide high-fidelity models, they are poorly suited for inference and lead to challenging inverse problems. We review the rapidly developing field of simulation-based inference and identify the forces giving new momentum to the field. Finally, we describe how the frontier is expanding so that a broad audience can appreciate the profound change these developments may have on science.
Neural networks for option pricing and hedging: a literature review
This work provides a review of this literature. The motivation for this summary arose from our companion paper Ruf and W ang [2019]. There we continue th e discussions of this note; in particular, of potentially problematic data leakage when training ANNs to historic financial data. This paper is organised in the following way. Section 2 featu res Table 1, a summary of the literature that concerns the use of ANNs for nonparametric pricing (and hedging) of options. Section 3 provides a list of recommended papers from Table 1. Section 4 provides a n overview of related work where ANNs are applied in the context of option pricing and hedging, but not necessarily as nonparametric estimation tools. Section 5 briefly discusses various regularisation techniq ues used in the reviewed literature.
Recent Advances in Algorithmic High-Dimensional Robust Statistics
Diakonikolas, Ilias, Kane, Daniel M.
Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust mean estimation under natural distributional assumptions, no efficient algorithm was known. Recent work in theoretical computer science gave the first efficient robust estimators for a number of fundamental statistical tasks, including mean and covariance estimation. Since then, there has been a flurry of research activity on algorithmic high-dimensional robust estimation in a range of settings. In this survey article, we introduce the core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation. We also provide an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks and discuss new directions and opportunities for future work.
Convergence to minima for the continuous version of Backtracking Gradient Descent
The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $\delta_0>0$ and $0<\alpha <1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,\delta_0]$ so that the map $H:\mathbb{R}^k\rightarrow \mathbb{R}^k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}^k$, we have $f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||^2$. (ii) For every $x_0\in \mathbb{R}^k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of $\{x_n\}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then $\{x_n\}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}_1\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_1$, the sequence $x_{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}_2\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_2$, any cluster point of the sequence $x_{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.
MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams
Bhatia, Siddharth, Hooi, Bryan, Yoon, Minji, Shin, Kijung, Faloutsos, Christos
Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 108-505 times faster than state-of-the-art approaches; (c) it provides 46%-52% higher accuracy (in terms of AUC) than state-of-the-art approaches.
Iteratively Training Look-Up Tables for Network Quantization
Cardinaux, Fabien, Uhlich, Stefan, Yoshiyama, Kazuki, Garcia, Javier Alonso, Mauch, Lukas, Tiedemann, Stephen, Kemp, Thomas, Nakamura, Akira
Abstract--Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memo ry as well as computational footprint. Popular reduction method s are network quantization or pruning, which either reduce the wo rd length of the network parameters or remove weights from the network if they are not needed. In this article we discuss a ge neral framework for network reduction which we call Look-Up T able Quantization (LUT -Q). For each layer, we learn a value dictionary and an assignment matrix to represent the network weights. W e propose a special solver which combines gradient descent an d a one-step k-means update to learn both the value dictionari es and assignment matrices iteratively. This method is very fle xible: by constraining the value dictionary, many different reduc tion problems such as nonuniform network quantization, traini ng of multiplierless networks, network pruning or simultaneo us quantization and pruning can be implemented without changi ng the solver . This flexibility of the LUT -Q method allows us to use the same method to train networks for different hardware capabilities. Deep neural networks (DNN)s are currently used in many machine learning and signal processing applications with g reat success as their performance often beats the previous state - of-the-art approaches by a large margin, e.g., see [2] for an overview of deep learning. DNN approaches have become standard practice in computer vision, automatic speech rec og-nition and partially in natural language processing. They a re also extensively investigated to support other domains lik e medicine, robotics and finance forecasting. Recently, there has been a lot of interest in the research community in reducing the memory/computational footprint of neural networks.
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow
Awan, Ammar Ahmad, Jain, Arpan, Anthony, Quentin, Subramoni, Hari, Panda, Dhabaleswar K.
The enormous amount of data and computation required to train DNNs have led to the rise of various parallelization strategies. Broadly, there are two strategies: 1) Data-Parallelism -- replicating the DNN on multiple processes and training on different training samples, and 2) Model-Parallelism -- dividing elements of the DNN itself into partitions across different processes. While data-parallelism has been extensively studied and developed, model-parallelism has received less attention as it is non-trivial to split the model across processes. In this paper, we propose HyPar-Flow: a framework for scalable and user-transparent parallel training of very large DNNs (up to 5,000 layers). We exploit TensorFlow's Eager Execution features and Keras APIs for model definition and distribution. HyPar-Flow exposes a simple API to offer data, model, and hybrid (model + data) parallel training for models defined using the Keras API. Under the hood, we introduce MPI communication primitives like send and recv on layer boundaries for data exchange between model-partitions and allreduce for gradient exchange across model-replicas. Our proposed designs in HyPar-Flow offer up to 3.1x speedup over sequential training for ResNet-110 and up to 1.6x speedup over Horovod-based data-parallel training for ResNet-1001; a model that has 1,001 layers and 30 million parameters. We provide an in-depth performance characterization of the HyPar-Flow framework on multiple HPC systems with diverse CPU architectures including Intel Xeon(s) and AMD EPYC. HyPar-Flow provides 110x speed up on 128 nodes of the Stampede2 cluster at TACC for hybrid-parallel training of ResNet-1001.
OpenAI forms exclusive computing partnership with Microsoft to build new Azure AI supercomputing technologies
Through this partnership, the companies will accelerate breakthroughs in AI and power OpenAI's efforts to create artificial general intelligence (AGI). The resulting enhancements to the Azure platform will also help developers build the next generation of AI applications. The companies will focus on building a computational platform in Azure of unprecedented scale, which will train and run increasingly advanced AI models, include hardware technologies that build on Microsoft's supercomputing technology, and adhere to the two companies' shared principles on ethics and trust. This will create the foundation for advancements in AI to be implemented in a safe, secure and trustworthy way and is a critical reason the companies chose to partner together. Over the past decade, innovative applications of deep neural networks coupled with increasing computational power have led to continuous AI breakthroughs in areas such as vision, speech, language processing, translation, robotic control and even gaming.
Location Attention for Extrapolation to Longer Sequences
Dubois, Yann, Dagan, Gautier, Hupkes, Dieuwke, Bruni, Elia
Neural networks are surprisingly good at interpolating and perform remarkably well when the training set examples resemble those in the test set. However, they are often unable to extrapolate patterns beyond the seen data, even when the abstractions required for such patterns are simple. In this paper, we first review the notion of extrapolation, why it is important and how one could hope to tackle it. We then focus on a specific type of extrapolation which is especially useful for natural language processing: generalization to sequences that are longer than the training ones. We hypothesize that models with a separate content- and location-based attention are more likely to extrapolate than those with common attention mechanisms. We empirically support our claim for recurrent seq2seq models with our proposed attention on variants of the Lookup Table task. This sheds light on some striking failures of neural models for sequences and on possible methods to approaching such issues.