Accuracy
A Latent Source Model for Nonparametric Time Series Classification
Chen, George H., Nikolov, Stanislav, Shah, Devavrat
For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%."
Conditional Random Fields via Univariate Exponential Families
Yang, Eunho, Ravikumar, Pradeep K., Allen, Genevera I., Liu, Zhandong
Conditional random fields, which model the distribution of a multivariate response conditioned on a set of covariates using undirected graphs, are widely used in a variety of multivariate prediction applications. Popular instances of this class of models such as categorical-discrete CRFs, Ising CRFs, and conditional Gaussian based CRFs, are not however best suited to the varied types of response variables in many applications, including count-valued responses. We thus introduce a โnovel subclass of CRFsโ, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. This allows us to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions. Also in particular, it addresses the common CRF problem of specifying feature'' functions determining the interactions between response variables and covariates. We develop a class of tractable penalized $M$-estimators to learn these CRF distributions from data, as well as a unified sparsistency analysis for this general class of CRFs showing exact structure recovery can be achieved with high probability."
Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
Lu, Yichao, Dhillon, Paramveer, Foster, Dean P., Ungar, Lyle
We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations ($p \gg n$). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of $O(n^2p)$. Our algorithm (SRHT-DRR) runs in time $O(np\log(n))$ and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets.
Inference of Network Summary Statistics Through Network Denoising
Balachandran, Prakash, Airoldi, Edoardo, Kolaczyk, Eric
Consider observing an undirected network that is `noisy' in the sense that there are Type I and Type II errors in the observation of edges. Such errors can arise, for example, in the context of inferring gene regulatory networks in genomics or functional connectivity networks in neuroscience. Given a single observed network then, to what extent are summary statistics for that network representative of their analogues for the true underlying network? Can we infer such statistics more accurately by taking into account the noise in the observed network edges? In this paper, we answer both of these questions. In particular, we develop a spectral-based methodology using the adjacency matrix to `denoise' the observed network data and produce more accurate inference of the summary statistics of the true network. We characterize performance of our methodology through bounds on appropriate notions of risk in the $L^2$ sense, and conclude by illustrating the practical impact of this work on synthetic and real-world data.
Joint segmentation of multivariate time series with hidden process regression for human activity recognition
Chamroukhi, Faicel, Mohammed, Samer, Trabelsi, Dorra, Oukhellou, Latifa, Amirat, Yacine
The problem of human activity recognition is central for understanding and predicting the human behavior, in particular in a prospective of assistive services to humans, such as health monitoring, well being, security, etc. There is therefore a growing need to build accurate models which can take into account the variability of the human activities over time (dynamic models) rather than static ones which can have some limitations in such a dynamic context. In this paper, the problem of activity recognition is analyzed through the segmentation of the multidimensional time series of the acceleration data measured in the 3-d space using body-worn accelerometers. The proposed model for automatic temporal segmentation is a specific statistical latent process model which assumes that the observed acceleration sequence is governed by sequence of hidden (unobserved) activities. More specifically, the proposed approach is based on a specific multiple regression model incorporating a hidden discrete logistic process which governs the switching from one activity to another over time. The model is learned in an unsupervised context by maximizing the observed-data log-likelihood via a dedicated expectation-maximization (EM) algorithm. We applied it on a real-world automatic human activity recognition problem and its performance was assessed by performing comparisons with alternative approaches, including well-known supervised static classifiers and the standard hidden Markov model (HMM). The obtained results are very encouraging and show that the proposed approach is quite competitive even it works in an entirely unsupervised way and does not requires a feature extraction preprocessing step.
Recursive Compressed Sensing
Freris, Nikolaos M., รรงal, Orhan, Vetterli, Martin
We introduce a recursive algorithm for performing compressed sensing on streaming data. The approach consists of a) recursive encoding, where we sample the input stream via overlapping windowing and make use of the previous measurement in obtaining the next one, and b) recursive decoding, where the signal estimate from the previous window is utilized in order to achieve faster convergence in an iterative optimization scheme applied to decode the new one. To remove estimation bias, a two-step estimation procedure is proposed comprising support set detection and signal amplitude estimation. Estimation accuracy is enhanced by a non-linear voting method and averaging estimates over multiple windows. We analyze the computational complexity and estimation error, and show that the normalized error variance asymptotically goes to zero for sublinear sparsity. Our simulation results show speed up of an order of magnitude over traditional CS, while obtaining significantly lower reconstruction error under mild conditions on the signal magnitudes and the noise level.
An Extensive Evaluation of Filtering Misclassified Instances in Supervised Classification Tasks
Smith, Michael R., Martinez, Tony
Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy. A popular approach for handling outliers and mislabeled instances is to remove any instance that is misclassified by a learning algorithm. However, an examination of which learning algorithms to use for filtering as well as their effects on multiple learning algorithms over a large set of data sets has not been done. Previous work has generally been limited due to the large computational requirements to run such an experiment, and, thus, the examination has generally been limited to learning algorithms that are computationally inexpensive and using a small number of data sets. In this paper, we examine 9 learning algorithms as filtering algorithms as well as examining the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. In addition to using each learning algorithm individually as a filter, we also use the set of learning algorithms as an ensemble filter and use an adaptive algorithm that selects a subset of the learning algorithms for filtering for a specific task and learning algorithm. We find that for most cases, using an ensemble of learning algorithms for filtering produces the greatest increase in classification accuracy. We also compare filtering with a majority voting ensemble. The voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, we find that a majority voting ensemble is robust to noise as filtering with a voting ensemble does not increase the classification accuracy of the voting ensemble.
A Latent Source Model for Nonparametric Time Series Classification
Chen, George H., Nikolov, Stanislav, Shah, Devavrat
For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.
Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Sharpnack, James, Krishnamurthy, Akshay, Singh, Aarti
The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lovasz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, k-nearest neighbor graphs, and epsilon-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.
A Component Lasso
Hussami, Nadine, Tibshirani, Robert
We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then solves the subproblems separately, obtaining a coefficient vector for each one. Then, it uses non-negative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery.