Supervised Learning
Precipitation Record Set in Northeast Nevada After Storm
A winter storm warning expired in the Lake Tahoe region Friday afternoon. The weather service said 11 inches (28 centimeters) of snow was recorded Thursday night and early Friday at the Northstar ski resort near Truckee, California and about 10 inches (25 centimeters) at Mt. Rose southwest of Reno. Up to 7 inches (18 centimeters) was reported at Heavenly in South Lake Tahoe, California.
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Qi, Charles Ruizhongtai, Yi, Li, Su, Hao, Guibas, Leonidas J.
Few prior works study deep learning on point sets. PointNet [20] is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.
Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification
Joshi, Bikash, Amini, Massih R., Partalas, Ioannis, Iutzeler, Franck, Maximov, Yury
We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.
Multi-Armed Bandits with Metric Movement Costs
Koren, Tomer, Livni, Roi, Mansour, Yishay
We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
Osokin, Anton, Bach, Francis, Lacoste-Julien, Simon
We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.
Scalable Prototype Selection by Genetic Algorithms and Hashing
Plasencia-Calaรฑa, Yenisel, Orozco-Alzate, Mauricio, Mรฉndez-Vรกzquez, Heydi, Garcรญa-Reyes, Edel, Duin, Robert P. W.
Classification in the dissimilarity space has become a very active research area since it provides a possibility to learn from data given in the form of pairwise non-metric dissimilarities, which otherwise would be difficult to cope with. The selection of prototypes is a key step for the further creation of the space. However, despite previous efforts to find good prototypes, how to select the best representation set remains an open issue. In this paper we proposed scalable methods to select the set of prototypes out of very large datasets. The methods are based on genetic algorithms, dissimilarity-based hashing, and two different unsupervised and supervised scalable criteria. The unsupervised criterion is based on the Minimum Spanning Tree of the graph created by the prototypes as nodes and the dissimilarities as edges. The supervised criterion is based on counting matching labels of objects and their closest prototypes. The suitability of these type of algorithms is analyzed for the specific case of dissimilarity representations. The experimental results showed that the methods select good prototypes taking advantage of the large datasets, and they do so at low runtimes. Preprint submitted to Elsevier December 27, 2017 1. Introduction The vector space representation is a common option to represent the data for learning tasks since many statistical techniques are applicable for this kind of representation. However, there is an increasing number of real-world problems which are not vectorial. Instead, the data are given in terms of pairwise dissimilarities which may be non-Euclidean and even non-metric.
Adversarial Structured Prediction for Multivariate Measures
Wang, Hong, Rezaei, Ashkan, Ziebart, Brian D.
Many predicted structured objects (e.g., sequences, matchings, trees) are evaluated using the F-score, alignment error rate (AER), or other multivariate performance measures. Since inductively optimizing these measures using training data is typically computationally difficult, empirical risk minimization of surrogate losses is employed, using, e.g., the hinge loss for (structured) support vector machines. These approximations often introduce a mismatch between the learner's objective and the desired application performance, leading to inconsistency. We take a different approach: adversarially approximate training data while optimizing the exact F-score or AER. Structured predictions under this formulation result from solving zero-sum games between a predictor seeking the best performance and an adversary seeking the worst while required to (approximately) match certain structured properties of the training data. We explore this approach for word alignment (AER evaluation) and named entity recognition (F-score evaluation) with linear-chain constraints.
Fast kNN mode seeking clustering applied to active learning
Duin, Robert P. W., Verzakov, Sergey
A significantly faster algorithm is presented for the original kNN mode seeking procedure. It has the advantages over the well-known mean shift algorithm that it is feasible in high-dimensional vector spaces and results in uniquely, well defined modes. Moreover, without any additional computational effort it may yield a multi-scale hierarchy of clusterings. The time complexity is just O(n^1.5). resulting computing times range from seconds for 10^4 objects to minutes for 10^5 objects and to less than an hour for 10^6 objects. The space complexity is just O(n). The procedure is well suited for finding large sets of small clusters and is thereby a candidate to analyze thousands of clusters in millions of objects. The kNN mode seeking procedure can be used for active learning by assigning the clusters to the class of the modal objects of the clusters. Its feasibility is shown by some examples with up to 1.5 million handwritten digits. The obtained classification results based on the clusterings are compared with those obtained by the nearest neighbor rule and the support vector classifier based on the same labeled objects for training. It can be concluded that using the clustering structure for classification can be significantly better than using the trained classifiers. A drawback of using the clustering for classification, however, is that no classifier is obtained that may be used for out-of-sample objects.
Supreme Court Ruling in McDonnell Case Sets Louisiana Congressman William Jefferson Free
Not long after Jefferson was put behind bars, another political corruption case blossomed, this time around Virginia Gov. Robert McDonnell and his wife, Maureen. The McDonnells reportedly accepted more than $175,000 in loans and gifts from a local businessman in return for his exposure to state officials and industry leaders. McDonnell and his wife were indicted and convicted after the governor left office in January 2014.