Europe
Large Linear Classification When Data Cannot Fit in Memory
Yu, Hsiang-Fu (National Taiwan University) | Hsieh, Cho-Jui (National Taiwan University) | Chang, Kai-Wei (National Taiwan University) | Lin, Chih-Jen (National Taiwan University)
Linear classification is a useful tool for dealing with large-scale data in applications such as document classification and natural language processing. Recent developments of linear classification have shown that the training process can be efficiently conducted. However, when the data size exceeds the memory capacity, most training methods suffer from very slow convergence due to the severe disk swapping. Although some methods have attempted to handle such a situation, they are usually too complicated to support some important functions such as parameter selection. In this paper, we introduce a block minimization framework for data larger than memory. Under the framework, a solver splits data into blocks and stores them into separate files. Then, at each time, the solver trains a data block loaded from disk. Although the framework is simple, the experimental results show that it effectively handles a data set 20 times larger than the memory capacity.
Wsabie: Scaling Up to Large Vocabulary Image Annotation
Weston, Jason (Google Research) | Bengio, Samy (Google Research) | Usunier, Nicolas (Universite de Paris 6)
Weighted Pairwise Classification (OWPC) loss [Usunier et al., 2009] which has been shown to be state-of-the-art on Image annotation datasets are becoming larger and (small) text retrieval tasks. WARP uses stochastic gradient larger, with tens of millions of images and tens descent and a novel sampling trick to approximate ranks resulting of thousands of possible annotations. We propose in an efficient online optimization strategy which we a strongly performing method that scales to show is superior to standard stochastic gradient descent applied such datasets by simultaneously learning to optimize to the same loss, enabling us to train on datasets that precision at the top of the ranked list of annotations do not even fit in memory.
A Framework for Longitudinal Influence Measurement between Communication Content and Social Networks
Wang, Shenghui (Vrije Universiteit Amsterdam) | Groth, Paul (Vrije Universiteit Amsterdam)
Artificial intelligence has a long history of learning from domain problems ranging from chess to jeopardy. In this work, we look at a problem stemming from social science, namely, how do social relationships influence communication content and vice versa. The tools used to study communication content (content analysis) have rarely been combined with those used to study social relationships (social network analysis). Furthermore, there is even less work addressing the longitudinal characteristics of such a combination. This paper presents a general framework for measuring the dynamic bi-directional influence between communication content and social networks. The framework leverages the idea that knowledge about both kinds of networks can be represented using the same knowledge representation. In particular, through the use of Semantic Web standards, the extraction of networks is made easier. The framework is applied to two use-cases: online forum discussions and conference publications. The results provide a new perspective over the dynamics involving both social networks and communication content.
Active Exploration for Robust Object Detection
Velez, Javier (Massachusetts Institute of Technology) | Hemann, Garrett (Massachusetts Institute of Technology) | Huang, Albert S. (Massachusetts Institute of Technology) | Posner, Ingmar (Oxford University) | Roy, Nicholas (Massachusetts Institute of Technology)
Today, mobile robots are increasingly expected to operate in ever more complex and dynamic environments.In order to carry out many of the higher level tasks envisioned a semantic understanding of a workspace is pivotal. Here our field has benefited significantly from successes in machine learning and vision: applications in robotics of off-the-shelf object detectors are plentiful. This paper outlines an online, any-time planning framework enabling the active exploration of such detections. Our approach exploits the ability to move to different vantage points and implicitly weighs the benefits of gaining more certainty about the existence of an object against the physical cost of the exploration required. The result is a robot which plans trajectories specifically to decrease the entropy of putative detections. Our system is demonstrated to significantly improve detection performance and trajectory length in simulated and real robot experiments.
Adaptive Data Compression for Robot Perception
Smith, Mike (Oxford University) | Posner, Ingmar (Oxford University) | Newman, Paul M (Oxford University)
This paper concerns the creation of an efficient, continuous, non-parametric representation of surfaces implicit in 3D laser data as typically recorded by mobile robots. Our approach explicitly leverages the probabilistic nature of Gaussian Process regression to provide for a principled, adaptive subsampling which automatically prunes redundant data. The algorithm places no restriction on the complexity of the underlying surfaces and enables predictions at arbitrary locations and densities. We present results using real and synthetic data and show that our approach attains decimation factors in excess of two orders of magnitude without significant degradation in fidelity of the workspace reconstructions.
Learning Linear and Kernel Predictors with the 0-1 Loss Function
Shalev-Shwartz, Shai (The Hebrew University) | Shamir, Ohad (The Hebrew University and Microsoft Research) | Sridharan, Karthik (Toyota Technological Institute)
Some of the most successful machine learning algorithms, such as Support Vector Machines, are based on learning linear and kernel predictors with respect to a convex loss function, such as the hinge loss. For classification purposes, a more natural loss function is the 0-1 loss. However, using it leads to a non-convex problem for which there is no known efficient algorithm. In this paper, we describe and analyze a new algorithm for learning linear or kernel predictors with respect to the 0-1 loss function. The algorithm is parameterized by $L$, which quantifies the effective width around the decision boundary in which the predictor may be uncertain. We show that without any distributional assumptions, and for any fixed $L$, the algorithm runs in polynomial time, and learns a classifier which is worse than the optimal such classifier by at most $\epsilon$. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn such classifiers in time polynomial in $L$.
Connecting the Dots Between News Articles
Shahaf, Dafna (Carnegie Mellon University) | Guestrin, Carlos (Carnegie Mellon University)
The process of extracting useful knowledge from large datasets has become one of the most pressing problems in todayโs society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this paper, we investigate methods for automatically connecting the dots โ providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events leading from the decline of home prices (2007) to the health-care debate (2009). We formalize the characteristics of a good chain and provide efficient algorithms to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.
Evaluation of Group Profiling Strategies
Senot, Christophe (Bell Labs - Alcatel-Lucent) | Kostadinov, Dimitre (Bell Labs - Alcatel-Lucent) | Bouzid, Makram (Bell Labs - Alcatel-Lucent) | Picault, Jรฉrรดme (Bell Labs - Alcatel-Lucent) | Aghasaryan, Armen (Bell Labs - Alcatel-Lucent)
Most of the existing personalization systems such as content recommenders or targeted ads focus on individual users and ignore the social situation in which the services are consumed. However, many human activities are social and involve several in-dividuals whose tastes and expectations must be taken into account by the system. When a group profile is not available, different profile aggrega-tion strategies can be applied to recommend ade-quate items to a group of users based on their indi-vidual profiles. We consider an approach intended to determine the factors that influence the choice of an aggregation strategy. We present evaluations made on a large-scale dataset of TV viewings, where real group interests are compared to the pre-dictions obtained by combining individual user profiles according to different strategies.
An On-Line Algorithm for Semantic Forgetting
Packer, Heather Stephanie (University of Southampton) | Gibbins, Nicholas (University of Southampton) | Jennings, Nicholas R (University of Southampton)
In AI, this area Ontologies that evolve through use to support new has been studied under a variety of names such as forgetting domain tasks can grow extremely large. Moreover, and variable elimination [Eiter et al., 2006; Wang et al., large ontologies require more resources to use and 2008]. We provide a general approach for ranking knowledge have slower response times than small ones. To according to its use and cost, which can be applied to systems help address this problem, we present an online semantic that are limited by memory resources to evaluate memory forgetting algorithm that removes ontology allocation. We also provide a specific approach to select fragments containing infrequently used or cheap to which concepts to remove from an ontology, using the ranking.
Mind the Eigen-Gap, or How to Accelerate Semi-Supervised Spectral Learning Algorithms
Mavroeidis, Dimitrios (Radboud University Nijmegen)
Semi-supervised learning algorithms commonly incorporate the available background knowledge such that an expression of the derived model's quality is improved. Depending on the specific context quality can take several forms and can be related to the generalization performance or to a simple clustering coherence measure. Recently, a novel perspective of semi-supervised learning has been put forward, that associates semi-supervised clustering with the efficiency of spectral methods. More precisely, it has been demonstrated that the appropriate use of partial supervision can bias the data Laplacian matrix such that the necessary eigenvector computations are provably accelerated. This result allows data mining practitioners to use background knowledge not only for improving the quality of clustering results, but also for accelerating the required computations. In this paper we initially provide a high level overview of the relevant efficiency maximizing semi-supervised methods such that their theoretical intuitions are comprehensively outlined. Consecutively, we demonstrate how these methods can be extended to handle multiple clusters and also discuss possible issues that may arise in the continuous semi-supervised solution. Finally, we illustrate the proposed extensions empirically in the context of text clustering.