Genre
Counting-Based Search: Branching Heuristics for Constraint Satisfaction Problems
Pesant, G., Quimper, C., Zanarini, A.
Designing a search heuristic for constraint programming that is reliable across problem domains has been an important research topic in recent years. This paper concentrates on one family of candidates: counting-based search. Such heuristics seek to make branching decisions that preserve most of the solutions by determining what proportion of solutions to each individual constraint agree with that decision. Whereas most generic search heuristics in constraint programming rely on local information at the level of the individual variable, our search heuristics are based on more global information at the constraint level. We design several algorithms that are used to count the number of solutions to specific families of constraints and propose some search heuristics exploiting such information. The experimental part of the paper considers eight problem domains ranging from well-established benchmark puzzles to rostering and sport scheduling. An initial empirical analysis identifies heuristic maxSD as a robust candidate among our proposals. We then evaluate the latter against the state of the art, including the latest generic search heuristics, restarts, and discrepancy-based tree traversals. Experimental results show that counting-based search generally outperforms other generic heuristics.
Empirical estimation of entropy functionals with confidence
Sricharan, Kumar, Raich, Raviv, Hero, Alfred O. III
This paper introduces a class of k-nearest neighbor ($k$-NN) estimators called bipartite plug-in (BPI) estimators for estimating integrals of non-linear functions of a probability density, such as Shannon entropy and R\'enyi entropy. The density is assumed to be smooth, have bounded support, and be uniformly bounded from below on this set. Unlike previous $k$-NN estimators of non-linear density functionals, the proposed estimator uses data-splitting and boundary correction to achieve lower mean square error. Specifically, we assume that $T$ i.i.d. samples ${X}_i \in \mathbb{R}^d$ from the density are split into two pieces of cardinality $M$ and $N$ respectively, with $M$ samples used for computing a k-nearest-neighbor density estimate and the remaining $N$ samples used for empirical estimation of the integral of the density functional. By studying the statistical properties of k-NN balls, explicit rates for the bias and variance of the BPI estimator are derived in terms of the sample size, the dimension of the samples and the underlying probability distribution. Based on these results, it is possible to specify optimal choice of tuning parameters $M/T$, $k$ for maximizing the rate of decrease of the mean square error (MSE). The resultant optimized BPI estimator converges faster and achieves lower mean squared error than previous $k$-NN entropy estimators. In addition, a central limit theorem is established for the BPI estimator that allows us to specify tight asymptotic confidence intervals.
Active Learning of Halfspaces under a Margin Assumption
Gonen, Alon, Sabato, Sivan, Shalev-Shwartz, Shai
We derive and analyze a new, efficient, pool-based active learning algorithm for halfspaces, called ALuMA. Most previous algorithms show exponential improvement in the label complexity assuming that the distribution over the instance space is close to uniform. This assumption rarely holds in practical applications. Instead, we study the label complexity under a large-margin assumption -- a much more realistic condition, as evident by the success of margin-based algorithms such as SVM. Our algorithm is computationally efficient and comes with formal guarantees on its label complexity. It also naturally extends to the non-separable case and to non-linear kernels. Experiments illustrate the clear advantage of ALuMA over other active learning algorithms.
Interaction Histories and Short Term Memory: Enactive Development of Turn-taking Behaviors in a Childlike Humanoid Robot
Broz, Frank, Nehaniv, Chrystopher L., Kose-Bagci, Hatice, Dautenhahn, Kerstin
In this article, an enactive architecture is described that allows a humanoid robot to learn to compose simple actions into turn-taking behaviors while playing interaction games with a human partner. The robot's action choices are reinforced by social feedback from the human in the form of visual attention and measures of behavioral synchronization. We demonstrate that the system can acquire and switch between behaviors learned through interaction based on social feedback from the human partner. The role of reinforcement based on a short term memory of the interaction is experimentally investigated. Results indicate that feedback based only on the immediate state is insufficient to learn certain turn-taking behaviors. Therefore some history of the interaction must be considered in the acquisition of turn-taking, which can be efficiently handled through the use of short term memory.
Type-elimination-based reasoning for the description logic SHIQbs using decision diagrams and disjunctive datalog
Rudolph, Sebastian, Krรถtzsch, Markus, Hitzler, Pascal
We propose a novel, type-elimination-based method for reasoning in the description logic SHIQbs including DL-safe rules. To this end, we first establish a knowledge compilation method converting the terminological part of an ALCIb knowledge base into an ordered binary decision diagram (OBDD) which represents a canonical model. This OBDD can in turn be transformed into disjunctive Datalog and merged with the assertional part of the knowledge base in order to perform combined reasoning. In order to leverage our technique for full SHIQbs, we provide a stepwise reduction from SHIQbs to ALCIb that preserves satisfiability and entailment of positive and negative ground facts. The proposed technique is shown to be worst case optimal w.r.t. combined and data complexity and easily admits extensions with ground conjunctive queries.
Do Linguistic Style and Readability of Scientific Abstracts Affect their Virality?
Guerini, Marco (Trento-Rise) | Pepe, Alberto (Harvard University) | Lepri, Bruno (Massachusetts Institute of Technology)
Reactions to textual content posted in an online social net- work show different dynamics depending on the linguistic style and readability of the submitted content. Do similar dy- namics exist for responses to scientific articles? Our intuition, supported by previous research, suggests that the success of a scientific article depends on its content, rather than on its linguistic style. In this article, we examine a corpus of sci- entific abstracts and three forms of associated reactions: ar- ticle downloads, citations, and bookmarks. Through a class- based psycholinguistic analysis and readability indices tests, we show that certain stylistic and readability features of ab- stracts clearly concur in determining the success and viral ca- pability of a scientific article.
Network Sampling Designs for Relational Classification
Ahmed, Nesreen K. (Purdue University) | Neville, Jennifer (Purdue University) | Kompella, Ramana (Purdue University)
Relational classification has been extensively studied recently due to its applications in social, biological, technological, and information networks. Much of the work in relational learning has focused on analyzing input data that comprise a single network. Although machine learning researchers have considered the issue of how to sample training and test sets from the input network (for evaluation), the mechanisms which are used to construct the input networks have largely been ignored. In most cases, the input network has itself been sampled from a larger target network (e.g., Facebook) and often the researcher is unaware of how the input network was constructed or what impact that may have on evaluation of the relational models. Since the goal in evaluating relational classification algorithms is to accurately assess their performance on the larger target network, it is critical to understand what impact the initial sampling method may have on our estimates of classification accuracy.In this paper, we present different sampling methods and systematically study their impact on evaluation of relational classification. Our results indicate that the choice of sampling method can impact classification performance, and thus consequently affects the accuracy of evaluation.
SearchBuddies: Bringing Search Engines into the Conversation
Hecht, Brent (Northwestern University) | Teevan, Jaime (Microsoft Research) | Morris, Meredith Ringel (Microsoft Research) | Liebling, Dan (Microsoft Research)
Although people receive trusted, personalized recommendations and auxiliary social benefits when they ask questions of their friends, using a search engine is often a more effective way to find an answer. Attempts to integrate social and algorithmic search have thus far focused on bringing social content into algorithmic search results. However, more of the benefits of social search can be preserved by reversing this approach and bringing algorithmic content into natural question-based conversations. To do this successfully, it is necessary to adapt search engine interaction to a social context. In this paper, we present SearchBuddies, a system that responds to Facebook status message questions with algorithmic search results. Via a three-month deployment of the system to 122 social network users, we explore how people responded to search content in a highly social environment. Our experience deploying SearchBuddies shows that a socially embedded search engine can successfully provide users with unique and highly relevant information in a social context and can be integrated into conversations around an information need. The deployment also illuminates specific challenges of embedding a search engine in a social environment and provides guidance toward solutions.
So.cl: An Interest Network for Informal Learning
Farnham, Shelly Diane (Microsoft Research) | Lahav, Michal (Microsoft Research) | Raskino, David (Microsoft Research) | Cheng, Lili (Microsoft Research) | Laird-McConnell, Tom (Microsoft Research)
Web search engines emerged prior to the dominance of social media. What if we imagined search as integrating with social media from the ground up? So.cl is a web application that combines web browsing, search, and social networking for the purposes of sharing and learning around topics of interest. In this paper, we present the results of a deployment study examining existing learning practices around search and social networking for students, and how these practices shifted when participants adopted So.cl. We found prior to using So.cl that students already heavily employed search tools and social media for learning. With the use of So.cl, we found that users engaged in lightweight, fun social sharing and learning for informal, personal topics, but not for more heavyweight collaboration around school or work. The public nature of So.cl encouraged users to post search results as much for self-expression as for searching, enabling serendipitous discovery around interests.
The Pulse of News in Social Media: Forecasting Popularity
Bandari, Roja (University of California Los Angeles) | Asur, Sitaram (HP Labs) | Huberman, Bernardo A (HP Labs)
News articles are extremely time sensitive by nature. There is also intense competition among news items to propagate as widely as possible. Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. Prior research has dealt with predicting eventual online popularity based on early popularity. It is most desirable, however, to predict the popularity of items prior to their release, fostering the possibility of appropriate decision making to modify an article and the manner of its publication. In this paper, we construct a multi-dimensional feature space derived from properties of an article and evaluate the efficacy of these features to serve as predictors of online popularity. We examine both regression and classification algorithms and demonstrate that despite randomness in human behavior, it is possible to predict ranges of popularity on twitter with an overall 84% accuracy. Our study also serves to illustrate the differences between traditionally prominent sources and those immensely popular on the social web.