Goto

Collaborating Authors

 filter length




Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation

arXiv.org Machine Learning

Abstract--The adaptive Iterative Soft-Thresholding Algorithm (IST A) has been a popular algorithm for finding a desirable solution to the LASSO problem without explicitly tuning the regularization parameter λ. Despite that the adaptive IST A is a successful practical algorithm, few theoretical results exist. In this paper, we present the theoretical analysis on the adaptive IST A with the thresh-olding strategy of estimating noise level by median absolut e deviation. We show properties of the fixed points of the algorithm, including scale equivariance, non-uniqueness, and local stability, prove the local linear convergence guarantee, and show its global convergence behavior . Many sparse approximation problems in machine learning and signal processing can be obtained as the solution to the LASSO problem, which can be solved by IST A. Despite its popularity, tuning The obtained LASSO solution is optimal in the mean-squared-error (MSE) sense with minimum assumptions, but LARS is not competitive in terms of computation time for large-scale problems [7].


Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification

arXiv.org Machine Learning

In this paper we propose a novel problem called the ForeClassing problem where the loss of a classification decision is only observed at a future time point after the classification decision has to be made. To solve this problem, we propose an approximately Bayesian deep neural network architecture called ForeClassNet for time series forecasting and classification. This network architecture forces the network to consider possible future realizations of the time series, by forecasting future time points and their likelihood of occurring, before making its final classification decision. To facilitate this, we introduce two novel neural network layers, Welford mean-variance layers and Boltzmann convolutional layers. Welford mean-variance layers allow networks to iteratively update their estimates of the mean and variance for the forecasted time points for each inputted time series to the network through successive forward passes, which the model can then consider in combination with a learned representation of the observed realizations of the time series for its classification decision. Boltzmann convolutional layers are linear combinations of approximately Bayesian convolutional layers with different filter lengths, allowing the model to learn multitemporal resolution representations of the input time series, and which resolutions to focus on within a given Boltzmann convolutional layer through a Boltzmann distribution. Through several simulation scenarios and two real world applications we demonstrate ForeClassNet achieves superior performance compared with current state of the art methods including a near 30% improvement in test set accuracy in our financial example compared to the second best performing model.


LightTS: Lightweight Time Series Classification with Adaptive Ensemble Distillation -- Extended Version

arXiv.org Artificial Intelligence

Due to the sweeping digitalization of processes, increasingly vast amounts of time series data are being produced. Accurate classification of such time series facilitates decision making in multiple domains. State-of-the-art classification accuracy is often achieved by ensemble learning where results are synthesized from multiple base models. This characteristic implies that ensemble learning needs substantial computing resources, preventing their use in resource-limited environments, such as in edge devices. To extend the applicability of ensemble learning, we propose the LightTS framework that compresses large ensembles into lightweight models while ensuring competitive accuracy. First, we propose adaptive ensemble distillation that assigns adaptive weights to different base models such that their varying classification capabilities contribute purposefully to the training of the lightweight model. Second, we propose means of identifying Pareto optimal settings w.r.t. model accuracy and model size, thus enabling users with a space budget to select the most accurate lightweight model. We report on experiments using 128 real-world time series sets and different types of base models that justify key decisions in the design of LightTS and provide evidence that LightTS is able to outperform competitors.


Convolutional Proximal Neural Networks and Plug-and-Play Algorithms

arXiv.org Artificial Intelligence

In this paper, we introduce convolutional proximal neural networks (cPNNs), which are by construction averaged operators. For filters of full length, we propose a stochastic gradient descent algorithm on a submanifold of the Stiefel manifold to train cPNNs. In case of filters with limited length, we design algorithms for minimizing functionals that approximate the orthogonality constraints imposed on the operators by penalizing the least squares distance to the identity operator. Then, we investigate how scaled cPNNs with a prescribed Lipschitz constant can be used for denoising signals and images, where the achieved quality depends on the Lipschitz constant. Finally, we apply cPNN based denoisers within a Plug-and-Play (PnP) framework and provide convergence results for the corresponding PnP forward-backward splitting algorithm based on an oracle construction.


Deep Learning of the Nonlinear Schr\"odinger Equation in Fiber-Optic Communications

arXiv.org Machine Learning

An important problem in fiber-optic communications is to invert the nonlinear Schr\"odinger equation in real time to reverse the deterministic effects of the channel. Interestingly, the popular split-step Fourier method (SSFM) leads to a computation graph that is reminiscent of a deep neural network. This observation allows one to leverage tools from machine learning to reduce complexity. In particular, the main disadvantage of the SSFM is that its complexity using M steps is at least M times larger than a linear equalizer. This is because the linear SSFM operator is a dense matrix. In previous work, truncation methods such as frequency sampling, wavelets, or least-squares have been used to obtain "cheaper" operators that can be implemented using filters. However, a large number of filter taps are typically required to limit truncation errors. For example, Ip and Kahn showed that for a 10 Gbaud signal and 2000 km optical link, a truncated SSFM with 25 steps would require 70-tap filters in each step and 100 times more operations than linear equalization. We find that, by jointly optimizing all filters with deep learning, the complexity can be reduced significantly for similar accuracy. Using optimized 5-tap and 3-tap filters in an alternating fashion, one requires only around 2-6 times the complexity of linear equalization, depending on the implementation.


Bayesian Modelling of fMRI lime Series

Neural Information Processing Systems

We present a Hidden Markov Model (HMM) for inferring the hidden psychological state (or neural activity) during single trial tMRI activation experiments with blocked task paradigms. Inference is based on Bayesian methodology, using a combination of analytical and a variety of Markov Chain Monte Carlo (MCMC) sampling techniques. The advantage of this method is that detection of short time learning effects between repeated trials is possible since inference is based only on single trial experiments.


Bayesian Modelling of fMRI lime Series

Neural Information Processing Systems

We present a Hidden Markov Model (HMM) for inferring the hidden psychological state (or neural activity) during single trial tMRI activation experiments with blocked task paradigms. Inference is based on Bayesian methodology, using a combination of analytical and a variety of Markov Chain Monte Carlo (MCMC) sampling techniques. The advantage of this method is that detection of short time learning effects between repeated trials is possible since inference is based only on single trial experiments.


Bayesian Modelling of fMRI lime Series

Neural Information Processing Systems

We present a Hidden Markov Model (HMM) for inferring the hidden psychological state (or neural activity) during single trial tMRI activation experimentswith blocked task paradigms. Inference is based on Bayesian methodology, using a combination of analytical and a variety of Markov Chain Monte Carlo (MCMC) sampling techniques. The advantage ofthis method is that detection of short time learning effects between repeated trials is possible since inference is based only on single trial experiments.