Goto

Collaborating Authors

 Country


Nonconvex Matrix Completion with Linearly Parameterized Factors

arXiv.org Machine Learning

Techniques of matrix completion aim to impute a large portion of missing entries in a data matrix through a small portion of observed ones, with broad machine learning applications including collaborative filtering, pairwise ranking, etc. In practice, additional structures are usually employed in order to improve the accuracy of matrix completion. Examples include subspace constraints formed by side information in collaborative filtering, and skew symmetry in pairwise ranking. This paper performs a unified analysis of nonconvex matrix completion with linearly parameterized factorization, which covers the aforementioned examples as special cases. Importantly, uniform upper bounds for estimation errors are established for all local minima, provided that the sampling rate satisfies certain conditions determined by the rank, condition number, and incoherence parameter of the ground-truth low rank matrix. Empirical efficiency of the proposed method is further illustrated by numerical simulations.


Topological Data Analysis in Text Classification: Extracting Features with Additive Information

arXiv.org Machine Learning

While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.


Unsupervised Deep Learning for MR Angiography with Flexible Temporal Resolution

arXiv.org Machine Learning

Time-resolved MR angiography (tMRA) has been widely used for dynamic contrast enhanced MRI (DCE-MRI) due to its highly accelerated acquisition. In tMRA, the periphery of the k-space data are sparsely sampled so that neighbouring frames can be merged to construct one temporal frame. However, this view-sharing scheme fundamentally limits the temporal resolution, and it is not possible to change the view-sharing number to achieve different spatio-temporal resolution trade-off. Although many deep learning approaches have been recently proposed for MR reconstruction from sparse samples, the existing approaches usually require matched fully sampled k-space reference data for supervised training, which is not suitable for tMRA. This is because high spatio-temporal resolution ground-truth images are not available for tMRA. To address this problem, here we propose a novel unsupervised deep learning using optimal transport driven cycle-consistent generative adversarial network (cycleGAN). In contrast to the conventional cycleGAN with two pairs of generator and discriminator, the new architecture requires just a single pair of generator and discriminator, which makes the training much simpler and improves the performance. Reconstruction results using in vivo tMRA data set confirm that the proposed method can immediately generate high quality reconstruction results at various choices of view-sharing numbers, allowing us to exploit better trade-off between spatial and temporal resolution in time-resolved MR angiography.


Are Direct Links Necessary in RVFL NNs for Regression?

arXiv.org Machine Learning

A random vector functional link network (RVFL) is widely used as a universal approximator for classification and regression problems. The big advantage of RVFL is fast training without backpropagation. This is because the weights and biases of hidden nodes are selected randomly and stay untrained. Recently, alternative architectures with randomized learning are developed which differ from RVFL in that they have no direct links and a bias term in the output layer. In this study, we investigate the effect of direct links and output node bias on the regression performance of RVFL. For generating random parameters of hidden nodes we use the classical method and two new methods recently proposed in the literature. We test the RVFL performance on several function approximation problems with target functions of different nature: nonlinear, nonlinear with strong fluctuations, nonlinear with linear component and linear. Surprisingly, we found that the direct links and output node bias do not play an important role in improving RVFL accuracy for typical nonlinear regression problems. Keywords: Random vector functional link network · Neural networks with random hidden nodes · Randomized learning algorithms.


A Novel Method of Extracting Topological Features from Word Embeddings

arXiv.org Machine Learning

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.


Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability

arXiv.org Machine Learning

In networks of independent entities that face similar predictive tasks, transfer machine learning enables to re-use and improve neural nets using distributed data sets without the exposure of raw data. As the number of data sets in business networks grows and not every neural net transfer is successful, indicators are needed for its impact on the target performance-its transferability. We perform an empirical study on a unique real-world use case comprised of sales data from six different restaurants. We train and transfer neural nets across these restaurant sales data and measure their transferability. Moreover, we calculate potential indicators for transferability based on divergences of data, data projections and a novel metric for neural net similarity. We obtain significant negative correlations between the transferability and the tested indicators. Our findings allow to choose the transfer path based on these indicators, which improves model performance whilst simultaneously requiring fewer model transfers.


High-dimensional Neural Feature using Rectified Linear Unit and Random Matrix Instance

arXiv.org Machine Learning

We design a ReLU-based multilayer neural network to generate a rich high-dimensional feature vector. The feature guarantees a monotonically decreasing training cost as the number of layers increases. We design the weight matrix in each layer to extend the feature vectors to a higher dimensional space while providing a richer representation in the sense of training cost. Linear projection to the target in the higher dimensional space leads to a lower training cost if a convex cost is minimized. An $\ell_2$-norm convex constraint is used in the minimization to improve the generalization error and avoid overfitting. The regularization hyperparameters of the network are derived analytically to guarantee a monotonic decrement of the training cost and therefore, it eliminates the need for cross-validation to find the regularization hyperparameter in each layer.


SuperNet -- An efficient method of neural networks ensembling

arXiv.org Machine Learning

The main flaw of neural network ensembling is that it is exceptionally demanding computationally, especially, if the individual sub-models are large neural networks, which must be trained separately. Having in mind that modern DNNs can be very accurate, they are already the huge ensembles of simple classifiers, and that one can construct more thrifty compressed neural net of a similar performance for any ensemble, the idea of designing the expensive SuperNets can be questionable. The widespread belief that ensembling increases the prediction time, makes it not attractive and can be the reason that the main stream of ML research is directed towards developing better loss functions and learning strategies for more advanced and efficient neural networks. On the other hand, all these factors make the architectures more complex what may lead to overfitting and high computational complexity, that is, to the same flaws for which the highly parametrized SuperNets ensembles are blamed. The goal of the master thesis is to speed up the execution time required for ensemble generation. Instead of training K inaccurate sub-models, each of them can represent various phases of training (representing various local minima of the loss function) of a single DNN [Huang et al., 2017; Gripov et al., 2018]. Thus, the computational performance of the SuperNet can be comparable to the maximum CPU time spent on training its single sub-model, plus usually much shorter CPU time required for training the SuperNet coupling factors.


Learning and Testing Variable Partitions

arXiv.org Machine Learning

$ $Let $F$ be a multivariate function from a product set $\Sigma^n$ to an Abelian group $G$. A $k$-partition of $F$ with cost $\delta$ is a partition of the set of variables $\mathbf{V}$ into $k$ non-empty subsets $(\mathbf{X}_1, \dots, \mathbf{X}_k)$ such that $F(\mathbf{V})$ is $\delta$-close to $F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)$ for some $F_1, \dots, F_k$ with respect to a given error metric. We study algorithms for agnostically learning $k$ partitions and testing $k$-partitionability over various groups and error metrics given query access to $F$. In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$. In contrast, for $k = 2$ and $n = 3$ learning a partition of cost $\delta + \epsilon$ is NP-hard. $2.$ When $F$ is real-valued and the error metric is the 2-norm, a 2-partition of cost $\sqrt{\delta^2 + \epsilon}$ can be learned in time $\tilde{\mathcal{O}}(n^5/\epsilon^2)$. $3.$ When $F$ is $\mathbb{Z}_q$-valued and the error metric is Hamming weight, $k$-partitionability is testable with one-sided error and $\mathcal{O}(kn^3/\epsilon)$ non-adaptive queries. We also show that even two-sided testers require $\Omega(n)$ queries when $k = 2$. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.


A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems

arXiv.org Machine Learning

Image prior modeling is the key issue in image recovery, computational imaging, compresses sensing, and other inverse problems. Recent algorithms combining multiple effective priors such as the sparse or low-rank models, have demonstrated superior performance in various applications. However, the relationships among the popular image models are unclear, and no theory in general is available to demonstrate their connections. In this paper, we present a theoretical analysis on the image models, to bridge the gap between applications and image prior understanding, including sparsity, group-wise sparsity, joint sparsity, and low-rankness, etc. We systematically study how effective each image model is for image restoration. Furthermore, we relate the denoising performance improvement by combining multiple models, to the image model relationships. Extensive experiments are conducted to compare the denoising results which are consistent with our analysis. On top of the model-based methods, we quantitatively demonstrate the image properties that are inexplicitly exploited by deep learning method, of which can further boost the denoising performance by combining with its complementary image models.