Kereta, Zeljko
Plug-and-Play Half-Quadratic Splitting for Ptychography
Denker, Alexander, Hertrich, Johannes, Kereta, Zeljko, Cipiccia, Silvia, Erin, Ecem, Arridge, Simon
Ptychography is a coherent diffraction imaging method that uses phase retrieval techniques to reconstruct complex-valued images. It achieves this by sequentially illuminating overlapping regions of a sample with a coherent beam and recording the diffraction pattern. Although this addresses traditional imaging system challenges, it is computationally intensive and highly sensitive to noise, especially with reduced illumination overlap. Data-driven regularisation techniques have been applied in phase retrieval to improve reconstruction quality. In particular, plug-and-play (PnP) offers flexibility by integrating data-driven denoisers as implicit priors. In this work, we propose a half-quadratic splitting framework for using PnP and other data-driven priors for ptychography. We evaluate our method both on natural images and real test objects to validate its effectiveness for ptychographic image reconstruction.
StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm
Oslandsbotn, Andreas, Kereta, Zeljko, Naumova, Valeriya, Freund, Yoav, Cloninger, Alexander
Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, existing implementations of KRR require that all the data is stored in the main memory, which severely limits the use of KRR in contexts where data size far exceeds the memory size. Such applications are increasingly common in data mining, bioinformatics, and control. A powerful paradigm for computing on data sets that are too large for memory is the streaming model of computation, where we process one data sample at a time, discarding each sample before moving on to the next one. In this paper, we propose StreaMRAK - a streaming version of KRR. StreaMRAK improves on existing KRR schemes by dividing the problem into several levels of resolution, which allows continual refinement to the predictions. The algorithm reduces the memory requirement by continuously and efficiently integrating new samples into the training model. With a novel sub-sampling scheme, StreaMRAK reduces memory and computational complexities by creating a sketch of the original data, where the sub-sampling density is adapted to the bandwidth of the kernel and the local dimensionality of the data. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum. The results show that the proposed algorithm is fast and accurate.
Construction and Monte Carlo estimation of wavelet frames generated by a reproducing kernel
De Vito, Ernesto, Kereta, Zeljko, Naumova, Valeriya, Rosasco, Lorenzo, Vigogna, Stefano
Nonlinear approximation over redundant families of localized waveforms has enabled the construction of efficient sparse representations, becoming common practice in signal processing, source coding, noise reduction, and beyond. Sparse dictionaries are also an important tool in machine learning, where the extraction of few relevant features can significantly enhance a variety of learning tasks, making them scale with enormous quantities of data. However, the role of wavelets in machine learning is still unclear, and the impact they had in signal processing has, by far, not been matched. One objective constraint to a direct application of classical wavelet techniques to modern data science is of geometric nature: real data are typically high-dimensional and inherently structured, often featuring or hiding non-Euclidean topologies. On the other hand, a representation built on empirical samples poses an additional problem of stability, accounted for by how well it generalizes to future data. In this paper, expanding upon the ideas outlined in [35], we introduce a data-driven construction of wavelet frames on non-Euclidean domains, and provide stability results in high probability. Starting from Haar's seminal work [31] and since the founding contributions of Grossmann and Morlet [30], a general theory of wavelet transforms and a wealth of specific families of wavelets have rapidly arisen [10, 14, 23, 38, 40], first and foremost on R
Estimating covariance and precision matrices along subspaces
Kereta, Zeljko, Klock, Timo
We study the accuracy of estimating the covariance and the precision matrix of a $D$-variate sub-Gaussian distribution along a prescribed subspace or direction using the finite sample covariance with $N \geq D$ samples. Our results show that the estimation accuracy depends almost exclusively only on the components of the distribution that correspond to desired subspaces or directions. This is relevant for problems where behavior of data along a lower-dimensional space is of specific interest, such as dimension reduction or structured regression problems. As a by-product of the analysis, we reduce the effect the matrix condition number has on the estimation of precision matrices. Two applications are presented: direction-sensitive eigenspace perturbation bounds, and estimation of the single-index model. For the latter, a new estimator, derived from the analysis, with strong theoretical guarantees and superior numerical performance is proposed.
Nonlinear generalization of the single index model
Kereta, Zeljko, Klock, Timo, Naumova, Valeriya
Single index model is a powerful yet simple model, widely used in statistics, machine learning, and other scientific fields. It models the regression function as $g()$, where a is an unknown index vector and x are the features. This paper deals with a nonlinear generalization of this framework to allow for a regressor that uses multiple index vectors, adapting to local changes in the responses. To do so we exploit the conditional distribution over function-driven partitions, and use linear regression to locally estimate index vectors. We then regress by applying a kNN type estimator that uses a localized proxy of the geodesic metric. We present theoretical guarantees for estimation of local index vectors and out-of-sample prediction, and demonstrate the performance of our method with experiments on synthetic and real-world data sets, comparing it with state-of-the-art methods.
A Learning Theory Approach to a Computationally Efficient Parameter Selection for the Elastic Net
de Vito, Ernesto, Kereta, Zeljko, Naumova, Valeria
Despite recent advances in regularisation theory, the issue of parameter selection still remains a challenge for most applications. In a recent work the framework of statistical learning was used to approximate the optimal Tikhonov regularisation parameter from noisy data. In this work, we improve their results and extend the analysis to the elastic net regularisation, providing explicit error bounds on the accuracy of the approximated parameter and the corresponding regularisation solution in a simplified case. Furthermore, in the general case we design a data-driven, automated algorithm for the computation of an approximate regularisation parameter. Our analysis combines statistical learning theory with insights from regularisation theory. We compare our approach with state-of-the-art parameter selection criteria and illustrate its superiority in terms of accuracy and computational time on simulated and real data sets.