Goto

Collaborating Authors

 Country


Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted $\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization Analysis

arXiv.org Artificial Intelligence

Deep unfolding methods---for example, the learned iterative shrinkage thresholding algorithm (LISTA)---design deep neural networks as learned variations of optimization methods. These networks have been shown to achieve faster convergence and higher accuracy than the original optimization methods. In this line of research, this paper develops a novel deep recurrent neural network (coined reweighted-RNN) by the unfolding of a reweighted $\ell_1$-$\ell_1$ minimization algorithm and applies it to the task of sequential signal reconstruction. To the best of our knowledge, this is the first deep unfolding method that explores reweighted minimization. Due to the underlying reweighted minimization model, our RNN has a different soft-thresholding function (alias, different activation functions) for each hidden unit in each layer. Furthermore, it has higher network expressivity than existing deep unfolding RNN models due to the over-parameterizing weights. Importantly, we establish theoretical generalization error bounds for the proposed reweighted-RNN model by means of Rademacher complexity. The bounds reveal that the parameterization of the proposed reweighted-RNN ensures good generalization. We apply the proposed reweighted-RNN to the problem of video frame reconstruction from low-dimensional measurements, that is, sequential frame reconstruction. The experimental results on the moving MNIST dataset demonstrate that the proposed deep reweighted-RNN significantly outperforms existing RNN models.


Towards Cognitive Routing based on Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Routing is one of the key functions for stable operation of network infrastructure. Nowadays, the rapid growth of network traffic volume and changing of service requirements call for more intelligent routing methods than before. Towards this end, we propose a definition of cognitive routing and an implementation approach based on Deep Reinforcement Learning (DRL). To facilitate the research of DRL-based cognitive routing, we introduce a simulator named RL4Net for DRL-based routing algorithm development and simulation. Then, we design and implement a DDPG-based routing algorithm. The simulation results on an example network topology show that the DDPG-based routing algorithm achieves better performance than OSPF and random weight algorithms. It demonstrate the preliminary feasibility and potential advantage of cognitive routing for future network.


Towards Detection of Sheep Onboard a UAV

arXiv.org Machine Learning

In this work we consider the task of detecting sheep onboard an unmanned aerial vehicle (UAV) flying at an altitude of 80 m. At this height, the sheep are relatively small, only about 15 pixels across. Although deep learning strategies have gained enormous popularity in the last decade and are now extensively used for object detection in many fields, state-of-the-art detectors perform poorly in the case of smaller objects. We develop a novel dataset of UAV imagery of sheep and consider a variety of object detectors to determine which is the most suitable for our task in terms of both accuracy and speed. Our findings indicate that a UNet detector using the weighted Hausdorff distance as a loss function during training is an excellent option for detection of sheep onboard a UAV.


Data Science in Economics

arXiv.org Machine Learning

School of the Built Environment, Oxford Brookes University, Oxford, OX3 0BP, UK. Abstract: This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models. LSDL Large-Scale Deep Learning LSTM Long Short-Term Memory LWDNN List-Wise Deep Neural Network MACN Multi-Agent Collaborated Network MB-LSTM Multivariate Bidirectional LSTM MDNN Multilayer Deep Neural Network MFNN Multi-Filters Neural Network MLP Multiple Layer Perceptron MLP Multi-Layer Perceptron NNRE Neural Network Regression Ensemble O-LSRM Optimal Long Short-Term Memory PCA Principal Component Analysis pSVM Proportion Support Vector Machines RBFNN Radial Basis Function Neural Network RBM Restricted Boltzmann Machine REP Reduced Error Pruning RF Random Forest RFR Random Forest Regression RNN Recurrent Neural Network SAE Stacked Autoencoders SLR Stepwise Linear Regressions SN-CFM Similarity, Neighborhood-Based Collaborative Filtering Model STI Stock Technical Indicators SVM Support Vector Machine SVR Support Vector Regression SVRE Support Vector Regression Ensemble, TDFA Time-Driven Feature-Aware TS-GRU Two-Stream GRU WA Wavelet Analysis WT Wavelet Transforms 1. Introduction Application of data science in different disciplines is exponentially increasing. Because data science has had tremendous progresses in analysis and use of data. Like other disciplines, economics has benefited from the advancements of data science. Advancements of data science in economics have been progressive and have recorded promising results in the literature.


A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG

arXiv.org Machine Learning

The recent advances in the field of deep learning have not been fully utilised for decoding imagined speech primarily because of the unavailability of sufficient training samples to train a deep network. In this paper, we present a novel architecture that employs deep neural network (DNN) for classifying the words "in" and "cooperate" from the corresponding EEG signals in the ASU imagined speech dataset. Nine EEG channels, which best capture the underlying cortical activity, are chosen using common spatial pattern (CSP) and are treated as independent data vectors. Discrete wavelet transform (DWT) is used for feature extraction. To the best of our knowledge, so far DNN has not been employed as a classifier in decoding imagined speech. Treating the selected EEG channels corresponding to each imagined word as independent data vectors helps in providing sufficient number of samples to train a DNN. For each test trial, the final class label is obtained by applying a majority voting on the classification results of the individual channels considered in the trial. We have achieved accuracies comparable to the state-of-the-art results. The results can be further improved by using a higher-density EEG acquisition system in conjunction with other deep learning techniques such as long short-term memory.


Improving Adversarial Robustness Through Progressive Hardening

arXiv.org Machine Learning

Adversarial training (AT) has become a popular choice for training robust networks. However, by virtue of its formulation, AT tends to sacrifice clean accuracy heavily in favor of robustness. Furthermore, AT with a large perturbation budget can cause models to get stuck at poor local minima and behave like a constant function, always predicting the same class. To address the above concerns we propose Adversarial Training with Early Stopping (ATES). The design of ATES is guided by principles from curriculum learning that emphasizes on starting "easy" and gradually ramping up on the "difficulty" of training. We do so by early stopping the adversarial example generation step in AT, progressively increasing difficulty of the samples the network trains on. This stabilizes network training even for large perturbation budgets and allows the network to operate at a better clean accuracy versus robustness trade-off curve compared to AT. Functionally, this leads to a significant improvement in both clean accuracy and robustness for ATES models.


NeCPD: An Online Tensor Decomposition with Optimal Stochastic Gradient Descent

arXiv.org Machine Learning

Multi-way data analysis has become an essential tool for capturing underlying structures in higher-order datasets stored in tensor $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $. $CANDECOMP/PARAFAC$ (CP) decomposition has been extensively studied and applied to approximate $\mathcal{X}$ by $N$ loading matrices $A^{(1)}, \dots, A^{(N)}$ where $N$ represents the order of the tensor. We propose a new efficient CP decomposition solver named NeCPD for non-convex problem in multi-way online data based on stochastic gradient descent (SGD) algorithm. SGD is very useful in online setting since it allows us to update $\mathcal{X}^{(t+1)}$ in one single step. In terms of global convergence, it is well known that SGD stuck in many saddle points when it deals with non-convex problems. We study the Hessian matrix to identify theses saddle points, and then try to escape them using the perturbation approach which adds little noise to the gradient update step. We further apply Nesterov's Accelerated Gradient (NAG) method in SGD algorithm to optimally accelerate the convergence rate and compensate Hessian computational delay time per epoch. Experimental evaluation in the field of structural health monitoring using laboratory-based and real-life structural datasets show that our method provides more accurate results compared with existing online tensor analysis methods.


Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

arXiv.org Machine Learning

Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods. To solve this problem we develop C-FAR, a novel method for Fast, Automated and Reproducible assessment of multiple hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. While it is applicable to large dataset in any domain that utilizes pairwise comparisons for assessment, our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons. On simulated data of 96 neurons under adverse conditions, including drifting and 25\% blackout, our algorithm produces near-perfect tracking relative to the ground truth. Our runtime scales linearly in the number of input trees, making it a competitive computational tool. These results indicate that C-FAR is highly suitable as a model selection and assessment tool in clustering tasks.


Predicting Performance of Asynchronous Differentially-Private Learning

arXiv.org Machine Learning

We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.


Self-Supervised Contextual Bandits in Computer Vision

arXiv.org Machine Learning

Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn't perform optimally and give alternative methods for better learning in these cases.