Goto

Collaborating Authors

 Country


The Design and Implementation of a Scalable DL Benchmarking Platform

arXiv.org Machine Learning

The current Deep Learning (DL) landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone - stifling the adoption of the innovations. In this work, we first identify $10$ design features which are desirable within a DL benchmarking platform. These features include: performing the evaluation in a consistent, reproducible, and scalable manner, being framework and hardware agnostic, supporting real-world benchmarking workloads, providing in-depth model execution inspection across the HW/SW stack levels, etc. We then propose MLModelScope, a DL benchmarking platform design that realizes the $10$ objectives. MLModelScope proposes a specification to define DL model evaluations and techniques to provision the evaluation workflow using the user-specified HW/SW stack. MLModelScope defines abstractions for frameworks and supports board range of DL models and evaluation scenarios. We implement MLModelScope as an open-source project with support for all major frameworks and hardware architectures. Through MLModelScope's evaluation and automated analysis workflows, we performed case-study analyses of $37$ models across $4$ systems and show how model, hardware, and framework selection affects model accuracy and performance under different benchmarking scenarios. We further demonstrated how MLModelScope's tracing capability gives a holistic view of model execution and helps pinpoint bottlenecks.


Modelling pressure-Hessian from local velocity gradients information in an incompressible turbulent flow field using deep neural networks

arXiv.org Machine Learning

The understanding of the dynamics of the velocity gradients in turbulent flows is critical to understanding various non-linear turbulent processes. The pressure-Hessian and the viscous-Laplacian govern the evolution of the velocity-gradients and are known to be non-local in nature. Over the years, several simplified dynamical models have been proposed that models the viscous-Laplacian and the pressure-Hessian primarily in terms of local velocity gradients information. These models can also serve as closure models for the Lagrangian PDF methods. The recent fluid deformation closure model (RFDM) has been shown to retrieve excellent one-time statistics of the viscous process. However, the pressure-Hessian modelled by the RFDM has various physical limitations. In this work, we first demonstrate the limitations of the RFDM in estimating the pressure-Hessian. Further, we employ a tensor basis neural network (TBNN) to model the pressure-Hessian from the velocity gradient tensor itself. The neural network is trained on high-resolution data obtained from direct numerical simulation (DNS) of isotropic turbulence at Reynolds number of 433 (JHU turbulence database, JHTD). The predictions made by the TBNN are tested against two different isotropic turbulence datasets at Reynolds number of 433 (JHTD) and 315 (UP Madrid turbulence database, UPMTD) and channel flow dataset at Reynolds number of 1000 (UT Texas and JHTD). The evaluation of the neural network output is made in terms of the alignment statistics of the predicted pressure-Hessian eigenvectors with the strain-rate eigenvectors for turbulent isotropic flow as well as channel flow. Our analysis of the predicted solution leads to the discovery of ten unique coefficients of the tensor basis of strain-rate and rotation-rate tensors, the linear combination over which accurately captures key alignment statistics of the pressure-Hessian tensor.


Benchmarking time series classification -- Functional data vs machine learning approaches

arXiv.org Machine Learning

Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common example being classification or regression problems with functional covariates. We study methods from functional data analysis, such as functional generalized additive models, as well as functionality to concatenate (functional-) feature extraction or basis representations with traditional machine learning algorithms like support vector machines or classification trees. In order to assess the methods and implementations, we run a benchmark on a wide variety of representative (time series) data sets, with in-depth analysis of empirical results, and strive to provide a reference ranking for which method(s) to use for non-expert practitioners. Additionally, we provide a software framework in R for functional data analysis for supervised learning, including machine learning and more linear approaches from statistics. This allows convenient access, and in connection with the machine-learning toolbox mlr, those methods can now also be tuned and benchmarked.


Basic Principles of Clustering Methods

arXiv.org Machine Learning

As an example, consider clustering pixels in an image (or video) if they belong to the same object. Different clustering methods are obtained by using different notions of similarity and different representations of data points.


Towards Making Deep Transfer Learning Never Hurt

arXiv.org Machine Learning

Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%--7% higher accuracy in all experiments.


Steady-State Control and Machine Learning of Large-Scale Deformable Mirror Models

arXiv.org Machine Learning

We use Machine Learning (ML) and system identification validation approaches to estimate neural network models of large-scale Deformable Mirrors (DMs) used in Adaptive Optics (AO) systems. To obtain the training, validation, and test data sets, we simulate a realistic large-scale Finite Element (FE) model of a faceplate DM. The estimated models reproduce the input-output behavior of Vector AutoRegressive with eXogenous (VARX) input models and can be used for the design of high-performance AO systems. We address the model order selection and overfitting problems. We also provide an FE based approach for computing steady-state control signals that produce the desired wavefront shape. This approach can be used to predict the steady-state DM correction performance for different actuator spacings and configurations. The presented methods are tested on models with thousands of state variables and hundreds of actuators. The numerical simulations are performed on low-cost high-performance graphic processing units and implemented using the TensorFlow machine learning framework. The used codes are available online. The approaches presented in this paper are useful for the design and optimization of high-performance DMs and AO systems.


Implicit Generative Modeling for Efficient Exploration

arXiv.org Machine Learning

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic" reward. In this work, we focus on model uncertainty estimation as an intrinsic reward for efficient exploration. In particular, we introduce an implicit generative modeling approach to estimate a Bayesian uncertainty of the agent's belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the future prediction based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we compare our implementation with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.


An explanation method for Siamese neural networks

arXiv.org Machine Learning

A new method for explaining the Siamese neural network is proposed. It uses the following main ideas. First, the explained feature vector is compared with the prototype of the corresponding class computed at the embedding level (the Siamese neura l network output). The important features at this level are determined as feature s which are close to the same features of the prototype. Second, an autoencoder is trained in a special way in order to take into account the embedding level of the Siamese ne twork, and its decoder part is used for reconstructing input data with the corresponding changes. Numerical experiments with the well - known dataset MNIST illustrate the propose method .


Improving Universal Sound Separation Using Sound Classification

arXiv.org Machine Learning

Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class. In this paper, we utilize the semantic information learned by sound classifier networks trained on a vast amount of diverse sounds to improve universal sound separation. In particular, we show that semantic embeddings extracted from a sound classifier can be used to condition a separation network, providing it with useful additional information. This approach is especially useful in an iterative setup, where source estimates from an initial separation stage and their corresponding classifier-derived embeddings are fed to a second separation network. By performing a thorough hyperparameter search consisting of over a thousand experiments, we find that classifier embeddings from clean sources provide nearly one dB of SNR gain, and our best iterative models achieve a significant fraction of this oracle performance, establishing a new state-of-the-art for universal sound separation.


General Matrix-Matrix Multiplication Using SIMD features of the PIII

arXiv.org Machine Learning

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the floating point Intel Pentium SIMD (Single Instruction Multiple Data) architecture. A description of the issues and our solution is presented, paying attention to all levels of the memory hierarchy. Our results demonstrate an average performance of 2.09 times faster than the leading public domain matrix-matrix multiply routines.