Goto

Collaborating Authors

 Statistical Learning


Saliency Detection within a Deep Convolutional Architecture

AAAI Conferences

To tackle the problem of saliency detection in images, we propose to learn adaptive mid-level features to represent image local information, and present an efficient way to calculate multi-scale and multi-level saliency maps. With the simple k-means algorithm, we learn adaptive low-level filters to convolve the image to produce response maps as the low-level features, which intrinsically capture texture and color information simultaneously. We adopt additional threshold and pooling techniques to generate mid-level features for more robustness in image local representation. Then, we define a set of hand-crafted filters, at multiple scales and multiple levels, to calculate local contrasts and result in several intermediate saliency maps, which are finally fused into the resultant saliency map with vision prior. Benefiting from these filters, the resultant saliency map not only captures subtle textures within the object, but also discovers the overall salient object in the image. Since both feature learning and saliency map calculation contain the convolution operation, we unify the two stages into one framework within a deep architecture. Through experiments over challenging benchmarks, we demonstrate the effectiveness of the proposed method.


Resolution-limit-free and local Non-negative Matrix Factorization quality functions for graph clustering

arXiv.org Machine Learning

Many graph clustering quality functions suffer from a resolution limit, the inability to find small clusters in large graphs. So called resolution-limit-free quality functions do not have this limit. This property was previously introduced for hard clustering, that is, graph partitioning. We investigate the resolution-limit-free property in the context of Non-negative Matrix Factorization (NMF) for hard and soft graph clustering. To use NMF in the hard clustering setting, a common approach is to assign each node to its highest membership cluster. We show that in this case symmetric NMF is not resolution-limit-free, but that it becomes so when hardness constraints are used as part of the optimization. The resulting function is strongly linked to the Constant Potts Model. In soft clustering, nodes can belong to more than one cluster, with varying degrees of membership. In this setting resolution-limit-free turns out to be too strong a property. Therefore we introduce locality, which roughly states that changing one part of the graph does not affect the clustering of other parts of the graph. We argue that this is a desirable property, provide conditions under which NMF quality functions are local, and propose a novel class of local probabilistic NMF quality functions for soft graph clustering.


Predictive support recovery with TV-Elastic Net penalty and logistic regression: an application to structural MRI

arXiv.org Machine Learning

The use of machine-learning in neuroimaging offers new perspectives in early diagnosis and prognosis of brain diseases. Although such multivariate methods can capture complex relationships in the data, traditional approaches provide irregular (l2 penalty) or scattered (l1 penalty) predictive pattern with a very limited relevance. A penalty like Total Variation (TV) that exploits the natural 3D structure of the images can increase the spatial coherence of the weight map. However, TV penalization leads to non-smooth optimization problems that are hard to minimize. We propose an optimization framework that minimizes any combination of l1, l2, and TV penalties while preserving the exact l1 penalty. This algorithm uses Nesterov's smoothing technique to approximate the TV penalty with a smooth function such that the loss and the penalties are minimized with an exact accelerated proximal gradient algorithm. We propose an original continuation algorithm that uses successively smaller values of the smoothing parameter to reach a prescribed precision while achieving the best possible convergence rate. This algorithm can be used with other losses or penalties. The algorithm is applied on a classification problem on the ADNI dataset. We observe that the TV penalty does not necessarily improve the prediction but provides a major breakthrough in terms of support recovery of the predictive brain regions.


Impact of regularization on Spectral Clustering

arXiv.org Machine Learning

The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et. al (2012). Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter $\tau$ is large we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than $\log n$) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a `bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigen gap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique \textit{DKest} (standing for estimated Davis-Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.


Completing Any Low-rank Matrix, Provably

arXiv.org Machine Learning

Matrix completion, i.e., the exact and provable recovery of a low-rank matrix from a small subset of its elements, is currently only known to be possible if the matrix satisfies a restrictive structural constraint---known as {\em incoherence}---on its row and column spaces. In these cases, the subset of elements is sampled uniformly at random. In this paper, we show that {\em any} rank-$ r $ $ n$-by-$ n $ matrix can be exactly recovered from as few as $O(nr \log^2 n)$ randomly chosen elements, provided this random choice is made according to a {\em specific biased distribution}: the probability of any element being sampled should be proportional to the sum of the leverage scores of the corresponding row, and column. Perhaps equally important, we show that this specific form of sampling is nearly necessary, in a natural precise sense; this implies that other perhaps more intuitive sampling schemes fail. We further establish three ways to use the above result for the setting when leverage scores are not known \textit{a priori}: (a) a sampling strategy for the case when only one of the row or column spaces are incoherent, (b) a two-phase sampling procedure for general matrices that first samples to estimate leverage scores followed by sampling for exact recovery, and (c) an analysis showing the advantages of weighted nuclear/trace-norm minimization over the vanilla un-weighted formulation for the case of non-uniform sampling.


Bayesian Nonparametric Crowdsourcing

arXiv.org Machine Learning

Crowdsourcing has been proven to be an effective and efficient tool to annotate large datasets. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary. We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low. At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies. Based on this, we propose in this paper two new fully unsupervised models based on a Chinese Restaurant Process (CRP) prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms.


Sequential Logistic Principal Component Analysis (SLPCA): Dimensional Reduction in Streaming Multivariate Binary-State System

arXiv.org Machine Learning

Sequential or online dimensional reduction is of interests due to the explosion of streaming data based applications and the requirement of adaptive statistical modeling, in many emerging fields, such as the modeling of energy end-use profile. Principal Component Analysis (PCA), is the classical way of dimensional reduction. However, traditional Singular Value Decomposition (SVD) based PCA fails to model data which largely deviates from Gaussian distribution. The Bregman Divergence was recently introduced to achieve a generalized PCA framework. If the random variable under dimensional reduction follows Bernoulli distribution, which occurs in many emerging fields, the generalized PCA is called Logistic PCA (LPCA). In this paper, we extend the batch LPCA to a sequential version (i.e. SLPCA), based on the sequential convex optimization theory. The convergence property of this algorithm is discussed compared to the batch version of LPCA (i.e. BLPCA), as well as its performance in reducing the dimension for multivariate binary-state systems. Its application in building energy end-use profile modeling is also investigated.


Automatic discovery of cell types and microcircuitry from neural connectomics

arXiv.org Machine Learning

Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a nonparametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists, including connectivity, cell body location and the spatial distribution of synapses, in a principled and probabilistically-coherent manner. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity, better than simpler algorithms. It also can reveal interesting structure in the nervous system of C. elegans, and automatically discovers the structure of a microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets.


Complex Support Vector Machines for Regression and Quaternary Classification

arXiv.org Machine Learning

The paper presents a new framework for complex Support Vector Regression as well as Support Vector Machines for quaternary classification. The method exploits the notion of widely linear estimation to model the input-out relation for complex-valued data and considers two cases: a) the complex data are split into their real and imaginary parts and a typical real kernel is employed to map the complex data to a complexified feature space and b) a pure complex kernel is used to directly map the data to the induced complex feature space. The recently developed Wirtinger's calculus on complex reproducing kernel Hilbert spaces (RKHS) is employed in order to compute the Lagrangian and derive the dual optimization problem. As one of our major results, we prove that any complex SVM/SVR task is equivalent with solving two real SVM/SVR tasks exploiting a specific real kernel which is generated by the chosen complex kernel. In particular, the case of pure complex kernels leads to the generation of new kernels, which have not been considered before. In the classification case, the proposed framework inherently splits the complex space into four parts. This leads naturally in solving the four class-task (quaternary classification), instead of the typical two classes of the real SVM. In turn, this rationale can be used in a multiclass problem as a split-class scenario based on four classes, as opposed to the one-versus-all method; this can lead to significant computational savings. Experiments demonstrate the effectiveness of the proposed framework for regression and classification tasks that involve complex data.


Learning Temporal Dynamics of Behavior Propagation in Social Networks

AAAI Conferences

Social influence has been widely accepted to explain people's cascade behaviors and further utilized in many related applications. However, few of existing work studied the direct, microscopic and temporal impact of social influence on people's behaviors in detail. In this paper we concentrate on the behavior modeling and systematically formulate the family of behavior propagation models (BPMs) including the static models (BP and IBP), and their discrete temporal variants (DBP and DIBP). To address the temporal dynamics of behavior propagation over continuous time, we propose a continuous temporal interest-aware behavior propagation model, called CIBP. As a new member of the BPM family, CIBP exploits the continuous-temporal functions (CTFs) to model the fully-continuous dynamic variance of social influence over time. Experiments on real-world datasets evaluated the family of BPMs and demonstrated the effectiveness of our proposed approach.