Genre
Wsabie: Scaling Up to Large Vocabulary Image Annotation
Weston, Jason (Google Research) | Bengio, Samy (Google Research) | Usunier, Nicolas (Universite de Paris 6)
Weighted Pairwise Classification (OWPC) loss [Usunier et al., 2009] which has been shown to be state-of-the-art on Image annotation datasets are becoming larger and (small) text retrieval tasks. WARP uses stochastic gradient larger, with tens of millions of images and tens descent and a novel sampling trick to approximate ranks resulting of thousands of possible annotations. We propose in an efficient online optimization strategy which we a strongly performing method that scales to show is superior to standard stochastic gradient descent applied such datasets by simultaneously learning to optimize to the same loss, enabling us to train on datasets that precision at the top of the ranked list of annotations do not even fit in memory.
Learning Linear and Kernel Predictors with the 0-1 Loss Function
Shalev-Shwartz, Shai (The Hebrew University) | Shamir, Ohad (The Hebrew University and Microsoft Research) | Sridharan, Karthik (Toyota Technological Institute)
Some of the most successful machine learning algorithms, such as Support Vector Machines, are based on learning linear and kernel predictors with respect to a convex loss function, such as the hinge loss. For classification purposes, a more natural loss function is the 0-1 loss. However, using it leads to a non-convex problem for which there is no known efficient algorithm. In this paper, we describe and analyze a new algorithm for learning linear or kernel predictors with respect to the 0-1 loss function. The algorithm is parameterized by $L$, which quantifies the effective width around the decision boundary in which the predictor may be uncertain. We show that without any distributional assumptions, and for any fixed $L$, the algorithm runs in polynomial time, and learns a classifier which is worse than the optimal such classifier by at most $\epsilon$. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn such classifiers in time polynomial in $L$.
Theoretical Justification of Popular Link Prediction Heuristics
Sarkar, Purnamrita (University of California, Berkeley) | Chakrabarti, Deepayan (Yahoo!) | Moore, Andrew W. (Google, Inc.)
There are common intuitions about how social graphs are generated (for example, it is common to talk informally about nearby nodes sharing a link). There are also common heuristics for predicting whether two currently unlinked nodes in a graph should be linked (e.g. for suggesting friends in an online social network or movies to customers in a recommendation network). This paper provides what we believe to be the first formal connection between these intuitions and these heuristics. We look at a familiar class of graph generation models in which nodes are associated with locations in a latent metric space and connections are more likely between closer nodes. We also look at popular linkprediction heuristics such as number-of-commonneighbors and its weighted variants [Adamic and Adar, 2003] which have proved successful in predicting missing links, but are not direct derivatives of latent space graph models. We provide theoretical justifications for the success of some measures as compared to others, as reported in previous empirical studies. In particular we present a sequence of formal results that show bounds related to the role that a node’s degree plays in its usefulness for link prediction, the relative importance of short paths versus long paths, and the effects of increasing non-determinism in the link generation process on link prediction quality. Our results can be generalized to any model as long as the latent space assumption holds.
An On-Line Algorithm for Semantic Forgetting
Packer, Heather Stephanie (University of Southampton) | Gibbins, Nicholas (University of Southampton) | Jennings, Nicholas R (University of Southampton)
In AI, this area Ontologies that evolve through use to support new has been studied under a variety of names such as forgetting domain tasks can grow extremely large. Moreover, and variable elimination [Eiter et al., 2006; Wang et al., large ontologies require more resources to use and 2008]. We provide a general approach for ranking knowledge have slower response times than small ones. To according to its use and cost, which can be applied to systems help address this problem, we present an online semantic that are limited by memory resources to evaluate memory forgetting algorithm that removes ontology allocation. We also provide a specific approach to select fragments containing infrequently used or cheap to which concepts to remove from an ontology, using the ranking.
Ties Matter: Complexity of Voting Manipulation Revisited
Obraztsova, Svetlana (Nanyang Technological University) | Elkind, Edith (Nanyang Technological University) | Hazon, Noam (Carnegie Mellon University)
In their groundbreaking paper, Bartholdi, Tovey and Trick [1989] argued that many well-known voting rules, such as Plurality, Borda, Copeland and Maximin are easy to manipulate. An important assumption made in that paper is that the manipulator’s goal is to ensure that his preferred candidate is among the candidates with the maximum score, or, equivalently, that ties are broken in favor of the manipulator’s preferred candidate. In this paper, we examine the role of this assumption in the easiness results of [Bartholdi et al., 1989]. We observe that the algorithm presented in [Bartholdi et al., 1989] extends to all rules that break ties according to a fixed ordering over the candidates. We then show that all scoring rules are easy to manipulate if the winner is selected from all tied candidates uniformly at random. This result extends to Maximin under an additional assumption on the manipulator’s utility function that is inspired by the original model of [Bartholdi et al., 1989]. In contrast, we show that manipulation becomes hard when arbitrary polynomial-time tie-breaking rules are allowed, both for the rules considered in [Bartholdi et al., 1989], and for a large class of scoring rules.
Recommender Systems: Missing Data and Statistical Model Estimation
Marlin, Benjamin M. (University of British Columbia) | Zemel, Richard S. (University of Toronto) | Roweis, Sam T. (New York University) | Slaney, Malcolm (Yahoo! Research)
The personalization aspect of recommender systems makes them well suited to applications in The goal of rating-based recommender systems is electronic commerce and entertainment, while the fact that to make personalized predictions and recommendations they do not rely on text-based descriptions of items makes for individual users by leveraging the preferences them well suited to content like movies and music. of a community of users with respect to a In this paper, we focus on a key problem in rating-based collection of items like songs or movies. Recommender collaborative filtering: the possibility of a basic incompatibility systems are often based on intricate statistical between the properties of recommender system data sets models that are estimated from data sets containing and the assumptions required for valid estimation and evaluation a very high proportion of missing ratings. of statistical models in the presence of missing data. This work describes evidence of a basic incompatibility We describe properties of recommender system data sets and between the properties of recommender relate them to the statistical theory of model estimation in system data sets and the assumptions required for the presence of nonrandom missing data. We describe an valid estimation and evaluation of statistical models extended modelling framework and a modified set of evaluation in the presence of missing data. We discuss the protocols for dealing with nonrandom missing data.
Finite Model Computation via Answer Set Programming
Gebser, Martin (University of Potsdam) | Sabuncu, Orkunt (University of Potsdam) | Schaub, Torsten (University of Potsdam)
We show how Finite Model Computation (FMC) of first-order theories can efficiently and transparentlybe solved by taking advantage of an extension of Answer Set Programming, called incremental Answer Set Programming (iASP). The idea is to use the incremental parameter in iASP programs to account for the domain size of a model. The FMC problem is then successively addressed for increasing domain sizes until an answer set, representing a finite model of the original first-order theory, is found. We developed a system based on the iASP solver iClingo and demonstrate its competitiveness.
Translation-Based Constraint Answer Set Solving
Drescher, Christian (NICTA and University of New South Wales) | Walsh, Toby (NICTA and University of New South Wales)
We solve constraint satisfaction problems through translation to answer set programming (ASP). Our reformulations have the property that unit-propagation in the ASP solver achieves well defined local consistency properties like arc, bound and range consistency. Experiments demonstrate the computational value of this approach.
Exploring Protein Fragment Assembly Using CLP
Palù, Alessandro Dal (University of Parma) | Dovier, Agostino (University of Udine) | Fogolari, Federico (University of Udine) | Pontelli, Enrico (New Mexico State University)
The paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict potential 3D conformations of a protein via fragments assembly. The fragments are extracted and clustered by a preprocessor from a database of known protein structures. Assembling fragments into a complete conformation is modeled as a constraint satisfaction problem solved using CLP. The approach makes use of a simplified CA-side chain centroid protein model, that offers efficiency and a good approximation for space filling. The approach adapts existing energy models for protein representation and applies a large neighboring search (LNS) strategy. The results show the feasibility and efficiency of the method, and the declarative nature of the approach simplifies the introduction of additional knowledge and variations of the model.
An Algorithm for Adapting Cases Represented in ALC
Cojan, Julien (UHP-Nancy 1, LORIA) | Lieber, Jean (UHP-Nancy 1, LORIA)
This paper presents an algorithm of adaptation for a case-based reasoning system with cases and domain knowledge represented in the expressive description logic ALC. The principle is to first pretend that the source case to be adapted solves the current target case. This may raise some contradictions with the specification of the target case and with the domain knowledge. The adaptation consists then in repairing these contradictions. This adaptation algorithm is based on an extension of the classical tableau method used for deductive inferences in ALC.