Genre
Closed-set lattice of regular sets based on a serial and transitive relation through matroids
Rough sets are efficient for data pre-processing in data mining. Matroids are based on linear algebra and graph theory, and have a variety of applications in many fields. Both rough sets and matroids are closely related to lattices. For a serial and transitive relation on a universe, the collection of all the regular sets of the generalized rough set is a lattice. In this paper, we use the lattice to construct a matroid and then study relationships between the lattice and the closed-set lattice of the matroid. First, the collection of all the regular sets based on a serial and transitive relation is proved to be a semimodular lattice. Then, a matroid is constructed through the height function of the semimodular lattice. Finally, we propose an approach to obtain all the closed sets of the matroid from the semimodular lattice. Borrowing from matroids, results show that lattice theory provides an interesting view to investigate rough sets.
An Extensive Evaluation of Filtering Misclassified Instances in Supervised Classification Tasks
Smith, Michael R., Martinez, Tony
Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy. A popular approach for handling outliers and mislabeled instances is to remove any instance that is misclassified by a learning algorithm. However, an examination of which learning algorithms to use for filtering as well as their effects on multiple learning algorithms over a large set of data sets has not been done. Previous work has generally been limited due to the large computational requirements to run such an experiment, and, thus, the examination has generally been limited to learning algorithms that are computationally inexpensive and using a small number of data sets. In this paper, we examine 9 learning algorithms as filtering algorithms as well as examining the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. In addition to using each learning algorithm individually as a filter, we also use the set of learning algorithms as an ensemble filter and use an adaptive algorithm that selects a subset of the learning algorithms for filtering for a specific task and learning algorithm. We find that for most cases, using an ensemble of learning algorithms for filtering produces the greatest increase in classification accuracy. We also compare filtering with a majority voting ensemble. The voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, we find that a majority voting ensemble is robust to noise as filtering with a voting ensemble does not increase the classification accuracy of the voting ensemble.
Efficient coordinate-descent for orthogonal matrices through Givens rotations
Optimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. It is based on {\em Givens-rotations}, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decomposition that is used in learning mixture models, and an algorithm for sparse-PCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genome-wide brain-wide mRNA expression dataset.
Consistent selection of tuning parameters via variable selection stability
Sun, Wei, Wang, Junhui, Fang, Yixin
Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on a novel concept of variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both fixed and diverging dimensions. The effectiveness of the proposed criterion is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data.
Oracle Inequalities for Convex Loss Functions with Non-Linear Targets
Caner, Mehmet, Kock, Anders Bredahl
This paper consider penalized empirical loss minimization of convex loss functions with unknown non-linear target functions. Using the elastic net penalty we establish a finite sample oracle inequality which bounds the loss of our estimator from above with high probability. If the unknown target is linear this inequality also provides an upper bound of the estimation error of the estimated parameter vector. These are new results and they generalize the econometrics and statistics literature. Next, we use the non-asymptotic results to show that the excess loss of our estimator is asymptotically of the same order as that of the oracle. If the target is linear we give sufficient conditions for consistency of the estimated parameter vector. Next, we briefly discuss how a thresholded version of our estimator can be used to perform consistent variable selection. We give two examples of loss functions covered by our framework and show how penalized nonparametric series estimation is contained as a special case and provide a finite sample upper bound on the mean square error of the elastic net series estimator.
A Latent Source Model for Nonparametric Time Series Classification
Chen, George H., Nikolov, Stanislav, Shah, Devavrat
For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.
Knowing Whether
Fan, Jie, Wang, Yanjing, van Ditmarsch, Hans
Knowing whether a proposition is true means knowing that it is true or knowing that it is false. In this paper, we study logics with a modal operator Kw for knowing whether but without a modal operator K for knowing that. This logic is not a normal modal logic, because we do not have Kw (phi -> psi) -> (Kw phi -> Kw psi). Knowing whether logic cannot define many common frame properties, and its expressive power less than that of basic modal logic over classes of models without reflexivity. These features make axiomatizing knowing whether logics non-trivial. We axiomatize knowing whether logic over various frame classes. We also present an extension of knowing whether logic with public announcement operators and we give corresponding reduction axioms for that. We compare our work in detail to two recent similar proposals.
A novel local search based on variable-focusing for random K-SAT
Lemoy, Rรฉmi, Alava, Mikko, Aurell, Erik
We introduce a new local search algorithm for satisfiability problems. Usual approaches focus uniformly on unsatisfied clauses. The new method works by picking uniformly random variables in unsatisfied clauses. A Variable-based Focused Metropolis Search (V-FMS) is then applied to random 3-SAT. We show that it is quite comparable in performance to the clause-based FMS. Consequences for algorithmic design are discussed.
Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Sharpnack, James, Krishnamurthy, Akshay, Singh, Aarti
The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lovasz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, k-nearest neighbor graphs, and epsilon-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.
Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling
Sainath, Tara N., Horesh, Lior, Kingsbury, Brian, Aravkin, Aleksandr Y., Ramabhadran, Bhuvana
Hessian-free training has become a popular parallel second or- der optimization technique for Deep Neural Network training. This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian. In this paper, we develop an L-BFGS based preconditioning scheme that avoids the need to access the Hessian explicitly. Since L-BFGS cannot be regarded as a fixed-point iteration, we further propose the employment of flexible Krylov subspace solvers that retain the desired theoretical convergence guarantees of their conventional counterparts. Second, we propose a new sampling algorithm, which geometrically increases the amount of data utilized for gradient and Krylov subspace iteration calculations. On a 50-hr English Broadcast News task, we find that these methodologies provide roughly a 1.5x speed-up, whereas, on a 300-hr Switchboard task, these techniques provide over a 2.3x speedup, with no loss in WER. These results suggest that even further speed-up is expected, as problems scale and complexity grows.