Country
Example Selection For Dictionary Learning
Tsuchida, Tomoki, Cottrell, Garrison W.
In unsupervised learning, an unbiased uniform sampling strategy is typically used, in order that the learned features faithfully encode the statistical structure of the training data. In this work, we explore whether active example selection strategies - algorithms that select which examples to use, based on the current estimate of the features - can accelerate learning. Specifically, we investigate effects of heuristic and saliency-inspired selection algorithms on the dictionary learning task with sparse activations. We show that some selection algorithms do improve the speed of learning, and we speculate on why they might work.
A Parzen-based distance between probability measures as an alternative of summary statistics in Approximate Bayesian Computation
Zuluaga, Carlos D., Valencia, Edgar A., Álvarez, Mauricio A.
Approximate Bayesian Computation (ABC) are likelihood-free Monte Carlo methods. ABC methods use a comparison between simulated data, using different parameters drew from a prior distribution, and observed data. This comparison process is based on computing a distance between the summary statistics from the simulated data and the observed data. For complex models, it is usually difficult to define a methodology for choosing or constructing the summary statistics. Recently, a nonparametric ABC has been proposed, that uses a dissimilarity measure between discrete distributions based on empirical kernel embeddings as an alternative for summary statistics. The nonparametric ABC outperforms other methods including ABC, kernel ABC or synthetic likelihood ABC. However, it assumes that the probability distributions are discrete, and it is not robust when dealing with few observations. In this paper, we propose to apply kernel embeddings using an smoother density estimator or Parzen estimator for comparing the empirical data distributions, and computing the ABC posterior. Synthetic data and real data were used to test the Bayesian inference of our method. We compare our method with respect to state-of-the-art methods, and demonstrate that our method is a robust estimator of the posterior distribution in terms of the number of observations.
Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process
Xuan, Junyu, Lu, Jie, Zhang, Guangquan, Da Xu, Richard Yi, Luo, Xiangfeng
Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditional text mining models has gained significant interests in the area of information retrieval, statistical natural language processing, and machine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors's interests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) model to resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determine the number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in order to capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma process that accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm with each conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets show the capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topics simultaneously.
Regression with Linear Factored Functions
Böhmer, Wendelin, Obermayer, Klaus
Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.
Nonparametric Relational Topic Models through Dependent Gamma Processes
Xuan, Junyu, Lu, Jie, Zhang, Guangquan, Da Xu, Richard Yi, Luo, Xiangfeng
Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.
Decentralized learning for wireless communications and networking
Giannakis, Georgios B., Ling, Qing, Mateos, Gonzalo, Schizas, Ioannis D., Zhu, Hao
This chapter deals with decentralized learning algorithms for in-network processing of graph-valued data. A generic learning problem is formulated and recast into a separable form, which is iteratively minimized using the alternating-direction method of multipliers (ADMM) so as to gain the desired degree of parallelization. Without exchanging elements from the distributed training sets and keeping inter-node communications at affordable levels, the local (per-node) learners consent to the desired quantity inferred globally, meaning the one obtained if the entire training data set were centrally available. Impact of the decentralized learning framework to contemporary wireless communications and networking tasks is illustrated through case studies including target tracking using wireless sensor networks, unveiling Internet traffic anomalies, power system state estimation, as well as spectrum cartography for wireless cognitive radio networks.
Orthogonal Matching Pursuit with Noisy and Missing Data: Low and High Dimensional Results
Chen, Yudong, Caramanis, Constantine
Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. This paper develops efficient OMP-like algorithms to deal with precisely this setting. Our algorithms are as efficient as OMP, and improve on the best-known results for missing and noisy data in regression, both in the high-dimensional setting where we seek to recover a sparse vector from only a few measurements, and in the classical low-dimensional setting where we recover an unstructured regressor. In the high-dimensional setting, our support-recovery algorithm requires no knowledge of even the statistics of the noise. Along the way, we also obtain improved performance guarantees for OMP for the standard sparse regression problem with Gaussian noise.
Active Authentication on Mobile Devices via Stylometry, Application Usage, Web Browsing, and GPS Location
Fridman, Lex, Weber, Steven, Greenstadt, Rachel, Kam, Moshe
Active authentication is the problem of continuously verifying the identity of a person based on behavioral aspects of their interaction with a computing device. In this study, we collect and analyze behavioral biometrics data from 200subjects, each using their personal Android mobile device for a period of at least 30 days. This dataset is novel in the context of active authentication due to its size, duration, number of modalities, and absence of restrictions on tracked activity. The geographical colocation of the subjects in the study is representative of a large closed-world environment such as an organization where the unauthorized user of a device is likely to be an insider threat: coming from within the organization. We consider four biometric modalities: (1) text entered via soft keyboard, (2) applications used, (3) websites visited, and (4) physical location of the device as determined from GPS (when outdoors) or WiFi (when indoors). We implement and test a classifier for each modality and organize the classifiers as a parallel binary decision fusion architecture. We are able to characterize the performance of the system with respect to intruder detection time and to quantify the contribution of each modality to the overall performance.
A simple coding for cross-domain matching with dimension reduction via spectral graph embedding
Data vectors are obtained from multiple domains. They are feature vectors of images or vector representations of words. Domains may have different numbers of data vectors with different dimensions. These data vectors from multiple domains are projected to a common space by linear transformations in order to search closely related vectors across domains. We would like to find projection matrices to minimize distances between closely related data vectors. This formulation of cross-domain matching is regarded as an extension of the spectral graph embedding to multi-domain setting, and it includes several multivariate analysis methods of statistics such as multiset canonical correlation analysis, correspondence analysis, and principal component analysis. Similar approaches are very popular recently in pattern recognition and vision. In this paper, instead of proposing a novel method, we will introduce an embarrassingly simple idea of coding the data vectors for explaining all the above mentioned approaches. A data vector is concatenated with zero vectors from all other domains to make an augmented vector. The cross-domain matching is solved by applying the single-domain version of spectral graph embedding to these augmented vectors of all the domains. An interesting connection to the classical associative memory model of neural networks is also discussed by noticing a coding for association. A cross-validation method for choosing the dimension of the common space and a regularization parameter will be discussed in an illustrative numerical example.
Sparse Linear Regression With Missing Data
Ganti, Ravi, Willett, Rebecca M.
This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the regression coefficients, and the proposed algorithm jointly learns the low-dimensional structure of the data and a linear regressor with sparse coefficients. The proposed stochastic optimization method, Sparse Linear Regression with Missing Data (SLRM), performs an alternating minimization procedure and scales well with the problem size. Large deviation inequalities shed light on the impact of the various problem-dependent parameters on the expected squared loss of the learned regressor. Extensive simulations on both synthetic and real datasets show that SLRM performs better than competing algorithms in a variety of contexts.