Energy
Bayesian deep learning for mapping via auxiliary information: a new era for geostatistics?
Kirkwood, Charlie, Economou, Theo, Pugeault, Nicolas
For geospatial modelling and mapping tasks, variants of kriging - the spatial interpolation technique developed by South African mining engineer Danie Krige - have long been regarded as the established geostatistical methods. However, kriging and its variants (such as regression kriging, in which auxiliary variables or derivatives of these are included as covariates) are relatively restrictive models and lack capabilities that have been afforded to us in the last decade by deep neural networks. Principal among these is feature learning - the ability to learn filters to recognise task-specific patterns in gridded data such as images. Here we demonstrate the power of feature learning in a geostatistical context, by showing how deep neural networks can automatically learn the complex relationships between point-sampled target variables and gridded auxiliary variables (such as those provided by remote sensing), and in doing so produce detailed maps of chosen target variables. At the same time, in order to cater for the needs of decision makers who require well-calibrated probabilities, we obtain uncertainty estimates via a Bayesian approximation known as Monte Carlo dropout. In our example, we produce a national-scale probabilistic geochemical map from point-sampled assay data, with auxiliary information provided by a terrain elevation grid. Unlike traditional geostatistical approaches, auxiliary variable grids are fed into our deep neural network raw. There is no need to provide terrain derivatives (e.g. slope angles, roughness, etc) because the deep neural network is capable of learning these and arbitrarily more complex derivatives as necessary to maximise predictive performance. We hope our results will raise awareness of the suitability of Bayesian deep learning - and its feature learning capabilities - for large-scale geostatistical applications where uncertainty matters.
Neural Time-Dependent Partial Differential Equation
Hu, Yihao, Zhao, Tong, Xu, Zhiliang, Lin, Lizhen
Partial differential equations(PDEs) play a crucial role in studying a vast number of problems in science and engineering. Numerically solving nonlinear and/or high-dimensional PDEs is often a challenging task. Inspired by the traditional finite difference and finite element methods and emerging advancements in machine learning, we propose a sequence deep learning framework called Neural-PDE, which allows to automatically learn governing rules of any time-dependent PDE system from existing data by using a bidirectional LSTM encoder, and predict the next n time steps data. One critical feature of our proposed framework is that the Neural-PDE is able to simultaneously learn and simulate the multiscale variables. We test the Neural-PDE by a range of examples from one-dimensional PDEs to a high-dimensional and nonlinear complex fluids model. The results show that the Neural-PDE is capable of learning the initial conditions, boundary conditions and differential operators without the knowledge of the specific form of a PDE system. In our experiments the Neural-PDE can efficiently extract the dynamics within 20 epochs training, and produces accurate predictions. Furthermore, unlike the traditional machine learning approaches in learning PDE such as CNN and MLP which require vast parameters for model precision, Neural-PDE shares parameters across all time steps, thus considerably reduces the computational complexity and leads to a fast learning algorithm.
Unsupervised Change Detection in Satellite Images with Generative Adversarial Network
Ren, Caijun, Wang, Xiangyu, Gao, Jian, Chen, Huanhuan
Detecting changed regions in paired satellite images plays a key role in many remote sensing applications. The evolution of recent techniques could provide satellite images with very high spatial resolution (VHR) and made it challenging to apply image coregistration whose accuracy is the basis of many change detection methods.Due to the advantage in deep feature representation, deep learning is introduced to detect changes on unregistered images. However, the absence of ground truth makes the performance of deep learning models in unsupervised task hard to be evaluated or be guaranteed.To alleviate the effect of unregistered pairs and make better use of deep learning structures, we propose a novel change detection procedure based on a special neural network architecture---Generative Adversarial Network (GAN).GAN features generating realistic images rather than giving hypervectors that contain visual features, so it is easy to evaluate the GAN model by judging the generated images. In this paper, we show that GAN model can be trained upon a pair of images through utilizing the proposed expanding strategy to create a training set and optimising designed objective functions. The optimised GAN model would produce many coregistered images where changes can be easily spotted and then the change map can be presented through a comparison strategy using these generated images explicitly.Compared to other deep learning-based methods, our method is less sensitive to the problem of unregistered images and makes most of the deep learning structure.Experimental results on synthetic images and real data with many different scenes could demonstrate the effectiveness of the proposed approach.
Implicit Feedback Deep Collaborative Filtering Product Recommendation System
Bhaskar, Karthik Raja Kalaiselvi, Kundur, Deepa, Lawryshyn, Yuri
Abstract--In this paper, several Collaborative Filtering (CF) approaches with latent variable methods were studied using user-item interactions to capture important hidden variations of the sparse customer purchasing behaviors. The latent factors are used to generalize the purchasing pattern of the customers and to provide product recommendations. CF with Neural Collaborative Filtering (NCF) was shown to produce the highest Normalized Discounted Cumulative Gain (NDCG) performance on the real-world proprietary dataset provided by a large parts supply company. Different hyperparameters were tested using Bayesian Optimization (BO) for applicability in the CF framework. External data sources like click-data and metrics like Clickthrough Rate (CTR) were reviewed for potential extensions to the work presented. The work shown in this paper provides techniques the Company can use to provide product recommendations to enhance revenues, attract new customers, and gain advantages over competitors. With today's ever-increasing ease of access to the internet more advertisements, attract new clients, and retain existing and information, we have reached a point of information clients [6].
Hyperparameter Optimization via Sequential Uniform Designs
Hyperparameter tuning or optimization plays a central role in the automated machine learning (AutoML) pipeline. It is a challenging task as the response surfaces of hyperparameters are generally unknown, and the evaluation of each experiment is expensive. In this paper, we reformulate hyperparameter optimization as a kind of computer experiment and propose a novel sequential uniform design (SeqUD) for hyperparameter optimization. It is advantageous as a) it adaptively explores the hyperparameter space with evenly spread design points, which is free of the expensive meta-modeling and acquisition optimization procedures in Bayesian optimization; b) sequential design points are generated in batch, which can be easily parallelized; and c) a real-time augmented uniform design (AugUD) algorithm is developed for the efficient generation of new design points. Experiments are conducted on both global optimization tasks and hyperparameter optimization applications. The results show that SeqUD outperforms related hyperparameter optimization methods, which is demonstrated to be a promising and competitive alternative of existing tools.
Finite Versus Infinite Neural Networks: an Empirical Study
Lee, Jaehoon, Schoenholz, Samuel S., Pennington, Jeffrey, Adlam, Ben, Xiao, Lechao, Novak, Roman, Sohl-Dickstein, Jascha
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.
Britain's first robot ship prepares to set sail
'We are at the mercy of the weather," smiles Don Scott, standing on a slender steel hull. "If the sea is like glass it could take two weeks, but the north Atlantic weather is going to have a lot to say about that. Here on a hillside industrial park above Plymouth, Britain's boldest foray into robot shipping is taking shape. On Sept 16, the Mayflower Autonomous Ship, a 15-metre-long steel trimaran, will slip into Plymouth Sound to prepare for a pioneering voyage: the first unmanned transatlantic crossing powered by artificial intelligence and solar energy. Barring an accident, no human beings will be involved in the 3,220-mile trip, which will mark the 400th anniversary of the original Mayflower crossing in 1620.
Opening the Black Box with Explainable AI [Hands-on]
Artificial Intelligence is often said to be a "black box" -- an opaque, almost mystical thing that we don't really understand. Throw data into the black box, and out comes a prediction, or so they say. However, much of AI is not opaque, it's just a complex system that "reasons" differently than we (think we) do. For example, kids learn to write by first experimenting with letters, and finding patterns in words. GPT-3 learned to write by training a generative text algorithm on the entire Internet, yielding a model with 175 billion parameters that, essentially, predict how "the Internet" would complete a prompt.
LACO: A Latency-Driven Network Slicing Orchestration in Beyond-5G Networks
Zanzi, Lanfranco, Sciancalepore, Vincenzo, Garcia-Saavedra, Andres, Schotten, Hans D., Costa-Perez, Xavier
Network Slicing is expected to become a game changer in the upcoming 5G networks and beyond, enlarging the telecom business ecosystem through still-unexplored vertical industry profits. This implies that heterogeneous service level agreements (SLAs) must be guaranteed per slice given the multitude of predefined requirements. In this paper, we pioneer a novel radio slicing orchestration solution that simultaneously provides latency and throughput guarantees in a multi-tenancy environment. Leveraging on a solid mathematical framework, we exploit the exploration-vs-exploitation paradigm by means of a multi-armed-bandit-based (MAB) orchestrator, LACO, that makes adaptive resource slicing decisions with no prior knowledge on the traffic demand or channel quality statistics. As opposed to traditional MAB methods that are blind to the underlying system, LACO relies on system structure information to expedite decisions. After a preliminary simulations campaign empirically proving the validness of our solution, we provide a robust implementation of LACO using off-the-shelf equipment to fully emulate realistic network conditions: near-optimal results within affordable computational time are measured when LACO is in place. L. Zanzi, V. Sciancalepore, A. Garcia-Saavedra and X. Costa-Pérez are with NEC Laboratories Europe GmbH., 69115 Heidelberg, Germany. The quest for new sources of revenue that revitalizes the mobile industry has spawned an unprecedented hype around the fifth-generation of mobile networks (5G) and, in particular, the network slicing concept. A high-level view of the system considered in this paper is described in Figure 1. The figure represents a series of sliceable base stations as a pool of radio resources (coloured cubes in the figure). The resource allocation process is considered hierarchical: while bundles of radio resources are assigned to different tenants (namely radio slices), each tenant autonomously schedules its bundle of radio resources to each individual user following classic radio scheduling policies. The difference between such operations is subtle but of paramount importance: a slice controller operates at a larger timescale and thus over a coarser granularity [2], [3]. While most prior work on network slicing focuses on average bit-rate guarantees [3], [4], latency considerations have received little attention.
Improving Language Generation with Sentence Coherence Objective
Sun, Ruixiao, Yang, Jie, Yousefzadeh, Mehrdad
Conditional story generation and contextual text continuation have become increasingly popular topics in NLP community. Existing models are often prone to output paragraphs of texts that gradually diverge from the given prompt. Although the generated text may have a reasonable perplexity and diversity, it could easily be identified by human as gibberish. The goal of our project is to improve the coherence and consistency across sentences in a language-generation model. We aim to solve this issue by first training a sentence pair coherence classifier with GPT-2 pretrained model, and then co-train the GPT-2 language model with this new coherence objective using a method analogous to the REINFORCE algorithm. This fine-tuned language model is able to generate lengthy paragraph conditioned on a given topic without diverging too much. The simplicity of this model allows it to be applicable to a variety of underlying language model architecture since it only modifies the final layer of the pre-trained model.