Country
Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds
Ge, Rong, Lee, Holden, Lu, Jianfeng
Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics. In this work, we consider the problem of estimating the normalizing constant $Z=\int_{\mathbb{R}^d} e^{-f(x)}\,\mathrm{d}x$ to within a multiplication factor of $1 \pm \varepsilon$ for a $\mu$-strongly convex and $L$-smooth function $f$, given query access to $f(x)$ and $\nabla f(x)$. We give both algorithms and lowerbounds for this problem. Using an annealing algorithm combined with a multilevel Monte Carlo method based on underdamped Langevin dynamics, we show that $\widetilde{\mathcal{O}}\Bigl(\frac{d^{4/3}\kappa + d^{7/6}\kappa^{7/6}}{\varepsilon^2}\Bigr)$ queries to $\nabla f$ are sufficient, where $\kappa= L / \mu$ is the condition number. Moreover, we provide an information theoretic lowerbound, showing that at least $\frac{d^{1-o(1)}}{\varepsilon^{2-o(1)}}$ queries are necessary. This provides a first nontrivial lowerbound for the problem.
Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space
Yang, Shuo, Shen, Yanyao, Sanghavi, Sujay
Quadratic regression involves modeling the response as a (generalized) linear function of not only the features $x^{j_1}$ but also of quadratic terms $x^{j_1}x^{j_2}$. The inclusion of such higher-order "interaction terms" in regression often provides an easy way to increase accuracy in already-high-dimensional problems. However, this explodes the problem dimension from linear $O(p)$ to quadratic $O(p^2)$, and it is common to look for sparse interactions (typically via heuristics). In this paper, we provide a new algorithm - Interaction Hard Thresholding (IntHT) which is the first one to provably accurately solve this problem in sub-quadratic time and space. It is a variant of Iterative Hard Thresholding; one that uses the special quadratic structure to devise a new way to (approx.) extract the top elements of a $p^2$ size gradient in sub-$p^2$ time and space. Our main result is to theoretically prove that, in spite of the many speedup-related approximations, IntHT linearly converges to a consistent estimate under standard high-dimensional sparse recovery assumptions. We also demonstrate its value via synthetic experiments. Moreover, we numerically show that IntHT can be extended to higher-order regression problems, and also theoretically analyze an SVRG variant of IntHT.
Adaptive Kernel Value Caching for SVM Training
Li, Qinbin, Wen, Zeyi, He, Bingsheng
Support Vector Machines (SVMs) can solve structured multi-output learning problems such as multi-label classification, multiclass classification and vector regression. SVM training is expensive especially for large and high dimensional datasets. The bottleneck of the SVM training often lies in the kernel value computation. In many real-world problems, the same kernel values are used in many iterations during the training, which makes the caching of kernel values potentially useful. The majority of the existing studies simply adopt the LRU (least recently used) replacement strategy for caching kernel values. However, as we analyze in this paper, the LRU strategy generally achieves high hit ratio near the final stage of the training, but does not work well in the whole training process. Therefore, we propose a new caching strategy called EFU (less frequently used) which replaces the less frequently used kernel values that enhances LFU (least frequently used). Our experimental results show that EFU often has 20\% higher hit ratio than LRU in the training with the Gaussian kernel. To further optimize the strategy, we propose a caching strategy called HCST (hybrid caching for the SVM training), which has a novel mechanism to automatically adapt the better caching strategy in the different stages of the training. We have integrated the caching strategy into ThunderSVM, a recent SVM library on many-core processors. Our experiments show that HCST adaptively achieves high hit ratios with little runtime overhead among different problems including multi-label classification, multiclass classification and regression problems. Compared with other existing caching strategies, HCST achieves 20\% more reduction in training time on average.
Collapse Resistant Deep Convolutional GAN for Multi-Object Image Generation
Bolluyt, Elijah D., Comaniciu, Cristina
This work introduces a novel system for the generation of images that contain multiple classes of objects. Recent work in Generative Adversarial Networks have produced high quality images, but many focus on generating images of a single object or set of objects. Our system addresses the task of image generation conditioned on a list of desired classes to be included in a single image. This enables our system to generate images with any given combination of objects, all composed into a visually realistic natural image. The system learns the interrelationships of all classes represented in a dataset, and can generate diverse samples including a set of these classes. It displays the ability to arrange these objects together, accounting for occlusions and inter-object spatial relations that characterize complex natural images. To accomplish this, we introduce a novel architecture based on Conditional Deep Convolutional GANs that is stabilized against collapse relative to both mode and condition. The system learns to rectify mode collapse during training, self-correcting to avoid suboptimal generation modes.
A multiple testing framework for diagnostic accuracy studies with co-primary endpoints
Westphal, Max, Zapf, Antonia, Brannath, Werner
This is indicated, among others, by several review and overview publications (Ching et al., 2018; Jiang et al., 2017; Litjens et al., 2017; Miotto, Wang, Wang, Jiang, & Dudley, 2017). In particular, the capabilities of end-to-end deep learning approaches on such supervised learning tasks are highly promising. For instance, vast advances have been reported in the literature regarding cancer diagnosis with deep neural networks (Hu et al., 2018). End-to-end deep learning refers to a trend involving deep (neural network) model architectures which are able to learn highly complex relationships between predictors and the target variable while having less parameters than traditional (more shallow) models with comparable performance (Goodfellow, Bengio, & Courville, 2016). In the training process, highly complex features are derived automatically by the learning algorithm (LeCun, Bengio, & Hinton, 2015). This framework contrasts the traditional pipeline of domain specific data preprocessing and handcrafted features in combination with simpler prediction models. Despite all the recent success of machine learning, there are still challenges regarding over-optimistic conclusions drawn from finite datasets which may to a large extend be attributed to the following two (broad) categories: 1. Study design and reporting: The most popular recommendation to split data for training, selection and evaluation is frequently employed in practice (Friedman, Hastie, & Tibshirani, 2009; Géron, 2017; Goodfellow et al., 2016; Japkowicz & Shah, 2011; Kuhn & Johnson, 2013; Zheng, 2015). In the ML community, the according datasets are commonly denoted as training, validation and test set.
Improved Visual Localization via Graph Smoothing
Lassance, Carlos, Latif, Yasir, Garg, Ravi, Gripon, Vincent, Reid, Ian
Vision based localization is the problem of inferring the pose of the camera given a single image. One solution to this problem is to learn a deep neural network to infer the pose of a query image after learning on a dataset of images with known poses. Another more commonly used approach rely on image retrieval where the query image is compared against the database of images and its pose is inferred with the help of the retrieved images. The latter approach assumes that images taken from the same places consists of the same landmarks and, thus would have similar feature representations. These representation can be learned using full supervision to be robust to different variations in capture conditions like time of the day and weather. In this work, we introduce a framework to enhance the performance of these retrieval based localization methods by taking into account the additional information including GPS coordinates and temporal neighbourhood of the images provided by the acquisition process in addition to the descriptor similarity of pairs of images in the reference or query database which is used traditionally for localization. Our method constructs a graph based on this additional information and use it for robust retrieval by smoothing the feature representation of reference and/or query images. We show that the proposed method is able to significantly improve the localization accuracy on two large scale datasets over the baselines.
Change your singer: a transfer learning generative adversarial framework for song to song conversion
Daher, Rema, Zein, Mohammad Kassem, Zini, Julia El, Awad, Mariette, Asmar, Daniel
Have you ever wondered how a song might sound if performed by a different artist? In this work, we propose SCM-GAN, an end-to-end non-parallel song conversion system powered by generative adversarial and transfer learning that allows users to listen to a selected target singer singing any song. SCM-GAN first separates songs into vocals and instrumental music using a U-Net network, then converts the vocal segments to the target singer using advanced CycleGAN-VC, before merging the converted vocals with their corresponding background music. SCM-GAN is first initialized with feature representations learned from a state-of-the-art voice-to-voice conversion and then trained on a dataset of non-parallel songs. Furthermore, SCM-GAN is evaluated against a set of metrics including global variance GV and modulation spectra MS on the 24 Mel-cepstral coefficients (MCEPs). Transfer learning improves the GV by 35% and the MS by 13% on average. A subjective comparison is conducted to test the user satisfaction with the quality and the naturalness of the conversion. Results show above par similarity between SCM-GAN's output and the target (70\% on average) as well as great naturalness of the converted songs.
Graph Convolutional Networks Meet with High Dimensionality Reduction
Recently, Graph Convolutional Networks (GCNs) and their variants have been receiving many research interests for learning graph-related tasks. While the GCNs have been successfully applied to this problem, some caveats inherited from classical deep learning still remain as open research topics in the context of the node classification problem. One such inherited caveat is that GCNs only consider the nodes that are a few propagations away from the labeled nodes to classify them. However, taking only a few propagation steps away nodes into account defeats the purpose of using the graph topological information in the GCNs. To remedy this problem, the-state-of-the-art methods leverage the network diffusion approaches, namely personalized page rank and its variants, to fully account for the graph topology, {\em after} they use the Neural Networks in the GCNs. However, these approaches overlook the fact that the network diffusion methods favour high degree nodes in the graph, resulting in the propagation of labels to unlabeled centralized, hub, nodes. To address this biasing hub nodes problem, in this paper, we propose to utilize a dimensionality reduction technique conjugate with personalized page rank so that we can both take advantage from graph topology and resolve the hub node favouring problem for GCNs. Here, our approach opens a new holistic road for message passing phase of GCNs by suggesting the usage of other proximity matrices instead of well-known Laplacian. Testing on two real-world networks that are commonly used in benchmarking GCNs' performance for the node classification context, we systematically evaluate the performance of the proposed methodology and show that our approach outperforms existing methods for wide ranges of parameter values with very limited deep learning training {\em epochs}.
How implicit regularization of Neural Networks affects the learned function -- Part I
Heiss, Jakob, Teichmann, Josef, Wutte, Hanna
Today, various forms of neural networks are trained to perform approximation tasks in many fields. However, the solutions obtained are not wholly understood. Empirical results suggest that the training favors regularized solutions. These observations motivate us to analyze properties of the solutions found by the gradient descent algorithm frequently employed to perform the training task. As a starting point, we consider one dimensional (shallow) neural networks in which weights are chosen randomly and only the terminal layer is trained. We show, that the resulting solution converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. This might give valuable insight on the properties of the solutions obtained using gradient descent methods in general settings.
Graph Domain Adaptation with Localized Graph Signal Representations
Pilavci, Yusuf Yigit, Guneyi, Eylem Tugce, Cengiz, Cemil, Vural, Elif
Graph Domain Adaptation with Localized Graph Signal Representations Yusuf Yi git Pilavcı, Eylem Tu g ce G uneyi, Cemil Cengiz and Elif Vural Abstract In this paper we propose a domain adaptation algorithm designed for graph domains. Given a source graph with many labeled nodes and a target graph with few or no labeled nodes, we aim to estimate the target labels by making use of the similarity between the characteristics of the variation of the label functions on the two graphs. Our assumption about the source and the target domains is that the local behaviour of the label function, such as its spread and speed of variation on the graph, bears resemblance between the two graphs. We estimate the unknown target labels by solving an optimization problem where the label information is transferred from the source graph to the target graph based on the prior that the projections of the label functions onto localized graph bases be similar between the source and the target graphs. In order to efficiently capture the local variation of the label functions on the graphs, spectral graph wavelets are used as the graph bases. Experimentation on various data sets shows that the proposed method yields quite satisfactory classification accuracy compared to reference domain adaptation methods. Keywords: Domain adaptation, spectral graph theory, graph signal processing, spectral graph wavelets, graph Laplacian 1 Introduction A common assumption in machine learning is that the training and the test data are sampled from the same distribution. Domain adaptation methods aim to provide solutions to machine learning problems by dealing with this distribution discrepancy. In domain adaptation, a source domain and a target domain are considered where the label information is mostly available for the data samples in the source domain, and few or none of the class labels are known in the target domain. The purpose is then to improve the learning performance in the target domain by making use Y. Y. Pilavcı is with the GIPSA Lab at Universit e Grenoble Alpes, Grenoble. C. Cengiz is with the Dept. of Computer Science and Engineering at Ko c University, Istanbul. Most part of this work was performed while the authors were at METU. 1 arXiv:1911.02883v1 A variety of approaches have been proposed so far for the domain adaptation problem. Some methods are based on reweighing the samples for removing the sample selection bias [1, 2]. Another common solution is to align the source and the target domains through feature space mappings.