Goto

Collaborating Authors

 Country


Quantifying the effect of representations on task complexity

arXiv.org Machine Learning

We examine the influence of input data representations on learning complexity. For learning, we posit that each model implicitly uses a candidate model distribution for unexplained variations in the data, its noise model. If the model distribution is not well aligned to the true distribution, then even relevant variations will be treated as noise. Crucially however, the alignment of model and true distribution can be changed, albeit implicitly, by changing data representations. "Better" representations can better align the model to the true distribution, making it easier to approximate the input-output relationship in the data without discarding useful data variations. To quantify this alignment effect of data representations on the difficulty of a learning task, we make use of an existing task complexity score and show its connection to the representation-dependent information coding length of the input. Empirically we extract the necessary statistics from a linear regression approximation and show that these are sufficient to predict relative learning performance outcomes of different data representations and neural network types obtained when utilizing an extensive neural network architecture search. We conclude that to ensure better learning outcomes, representations may need to be tailored to both task and model to align with the implicit distribution of model and task.


Neural Networks-based Regularization of Large-Scale Inverse Problems in Medical Imaging

arXiv.org Machine Learning

--In this paper we present a generalized Deep Learning-based approach to solve ill-posed large-scale inverse problems occurring in medical imaging. Recently, Deep Learning methods using iterative neural networks and cascaded neural networks have been reported to achieve excellent image quality for the task of image reconstruction in different imaging modalities. However, the fact that these approaches employ the forward and adjoint operators repeatedly in the network architecture requires the network to process the whole images or volumes at once, which for some applications is computationally infeasible. In this work, we follow a different reconstruction strategy by decoupling the regularization of the solution from ensuring consistency with the measured data. The regularization is given in the form of an image prior obtained by the output of a previously trained neural network which is used in a Tikhonov regularization framework. By doing so, more complex and sophisticated network architectures can be used for the removal of the artefacts or noise than it is usually the case in iterative networks. Due to the large scale of the considered problems and the resulting computational complexity of the employed networks, the priors are obtained by processing the images or volumes as patches or slices. We evaluated the method for the cases of 3D cone-beam low dose CT and undersampled 2D radial cine MRI and compared it to a total variation-minimization-based reconstruction algorithm as well as to a method with regularization based on learned overcomplete dictionaries. The proposed method outperformed all the reported methods with respect to all chosen quantitative measures and further accelerates the regularization step in the reconstruction by several orders of magnitude. N inverse problems, the goal is to recover an object of interest from a set of indirect and possibly incomplete observations. M. Haltmeier is with the Department of Mathematics, University of Innsbruck, Innsbruck, Austria (email: markus.haltmeier@uibk.ac.at) T. Schaeffter is with the Physikalisch-Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany, King's College London, London, UK and the Department of Medical Engineering, Technical University of Berlin, Berlin, Germany (email: tobias.schaeffter@ptb.de) M. Dewey is with the Department of Radiology, Charit e - Univer-sit atsmedizin Berlin, Berlin, Germany and the Berlin Institute of Health, Berlin, Germany (email: marc.dewey@charite.de) C. Kolbitsch is with the Physikalisch-Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany and King's College London, London, UK (email: christoph.kolbitsch@ptb.de) The reconstruction from the measured data can be an ill-posed inverse problem for different reasons.


Robust Multi-Output Learning with Highly Incomplete Data via Restricted Boltzmann Machines

arXiv.org Machine Learning

In a standard multi-output classification scenario, both features and labels of training data are partially observed. This challenging issue is widely witnessed due to sensor or database failures, crowd-sourcing and noisy communication channels in industrial data analytic services. Classic methods for handling multi-output classification with incomplete supervision information usually decompose the problem into an imputation stage that reconstructs the missing training information, and a learning stage that builds a classifier based on the imputed training set. These methods fail to fully leverage the dependencies between features and labels. In order to take full advantage of these dependencies we consider a purely probabilistic setting in which the features imputation and multi-label classification problems are jointly solved. Indeed, we show that a simple Restricted Boltzmann Machine can be trained with an adapted algorithm based on mean-field equations to efficiently solve problems of inductive and transductive learning in which both features and labels are missing at random. The effectiveness of the approach is demonstrated empirically on various datasets, with particular focus on a real-world Internet-of-Things security dataset.


Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

arXiv.org Machine Learning

Multi-horizon forecasting problems often contain a complex mix of inputs -- including static (i.e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed historically -- without any prior information on how they interact with the target. While several deep learning models have been proposed for multi-step prediction, they typically comprise black-box models which do not account for the full range of inputs present in common scenarios. In this paper, we introduce the Temporal Fusion Transformer (TFT) -- a novel attention-based architecture which combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics. To learn temporal relationships at different scales, the TFT utilizes recurrent layers for local processing and interpretable self-attention layers for learning long-term dependencies. The TFT also uses specialized components for the judicious selection of relevant features and a series of gating layers to suppress unnecessary components, enabling high performance in a wide range of regimes. On a variety of real-world datasets, we demonstrate significant performance improvements over existing benchmarks, and showcase three practical interpretability use-cases of TFT.


FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

arXiv.org Machine Learning

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at the cost of significantly reduced accuracy. In this paper, we present a novel approach to quantizing convolutional neural network. The resulting networks perform all computations in low-precision, without requiring higher-precision BN and nonlinearities, while still being highly accurate. To achieve this result, we employ a novel quantization technique that learns to optimally quantize the weights and activations of the network during training. Additionally, to enhance training convergence we use a new training technique, called gradual quantization. We leverage the nonlinear and normalizing behavior of our quantization function to effectively remove the higher-precision nonlinearities and BN from the network. The resulting convolutional layers are fully quantized to low precision, from input to output, ideal for neural network accelerators on the edge. We demonstrate the potential of this approach on different datasets and networks, showing that ternary-weight CNNs with low-precision in- and outputs perform virtually on par with their full-precision equivalents. Finally, we analyze the influence of noise on the weights, activations and convolution outputs (multiply-accumulate, MAC) and propose a strategy to improve network performance under noisy conditions.


Normalizing flows for deep anomaly detection

arXiv.org Machine Learning

In this work, we consider cases with missing certain kinds of anomalies in the training dataset, while significant statistics for the normal class is available. For such scenarios, conventional supervised methods might suffer from the class imbalance, while unsupervised methods tend to ignore difficult anomalous examples. We extend the idea of the supervised classification approach for class-imbalanced datasets by exploiting normalizing flows for proper Bayesian inference of the posterior probabilities. Index Terms --Machine Learning, Neural Nets, Anomaly Detection, Imbalanced Data Set, Generate Potential Outliers, Normalizing Flow null 1 I NTRODUCTION The anomaly detection problem is one of the important tasks in the analysis of real-world data. Possible applications range from the data-quality certification [1] to finding the rare specific cases of the diseases in medicine [2].


Practical applicability of deep neural networks for overlapping speaker separation

arXiv.org Machine Learning

This paper examines the applicability in realistic scenari os of two deep learning based solutions to the overlapping speake r separation problem. Firstly, we present experiments that s how that these methods are applicable for a broad range of languages. Further experimentation indicates limited perfor mance loss for untrained languages, when these have common features with the trained language(s). Secondly, it investiga tes how the methods deal with realistic background noise and propos es some modifications to better cope with these disturbances. T he deep learning methods that will be examined are deep cluster ing and deep attractor networks.


CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

arXiv.org Machine Learning

In recent years there have been many deep learning approaches towards the multi-speaker source separation problem. Most use Long Short-Term Memory - Recurrent Neural Networks (LSTM-RNN) or Convolutional Neural Networks (CNN) to model the sequential behavior of speech. In this paper we propose a novel network for source separation using an encoder-decoder CNN and LSTM in parallel. Hyper parameters have to be chosen for both parts of the network and they are potentially mutually dependent. Since hyper parameter grid search has a high computational burden, random search is often preferred. However, when sampling a new point in the hyper parameter space, it can potentially be very close to a previously evaluated point and thus give little additional information. Furthermore, random sampling is as likely to sample in a promising area as in an hyper space area dominated with poor performing models. Therefore, we use a Bayesian hyper parameter optimization technique and find that the parallel CNN-LSTM outperforms the LSTM-only and CNN-only model.


Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection

arXiv.org Machine Learning

Deep learning demands a huge amount of well-labeled data to train the network parameters. How to use the least amount of labeled data to obtain the desired classification accuracy is of great practical significance, because for many real-world applications (such as medical diagnosis), it is difficult to obtain so many labeled samples. In this paper, modify the unsupervised discriminant projection algorithm from dimension reduction and apply it as a regularization term to propose a new semi-supervised deep learning algorithm, which is able to utilize both the local and nonlocal distribution of abundant unlabeled samples to improve classification performance. Experiments show that given dozens of labeled samples, the proposed algorithm can train a deep network to attain satisfactory classification results.


Meta Decision Trees for Explainable Recommendation Systems

arXiv.org Machine Learning

We tackle the problem of building explainable recommendation systems that are based on a per-user decision tree, with decision rules that are based on single attribute values. We build the trees by applying learned regression functions to obtain the decision rules as well as the values at the leaf nodes. The regression functions receive as input the embedding of the user's training set, as well as the embedding of the samples that arrive at the current node. The embedding and the regressors are learned end-to-end with a loss that encourages the decision rules to be sparse. By applying our method, we obtain a collaborative filtering solution that provides a direct explanation to every rating it provides. With regards to accuracy, it is competitive with other algorithms. However, as expected, explainability comes at a cost and the accuracy is typically slightly lower than the state of the art result reported in the literature.