South America
The Linear Distance Traveling Tournament Problem
Hoshino, Richard (National Institute of Informatics) | Kawarabayashi, Ken-ichi (National Institute of Informatics)
We introduce a linear distance relaxation of the n-team Traveling Tournament Problem (TTP), a simple yet powerful heuristic that temporarily "assumes"' the n teams are located on a straight line, thereby reducing the n ( n –1)/2 pairwise distance parameters to just n –1 variables. The modified problem then becomes easier to analyze, from which we determine an approximate solution for the actual instance on n teams. We present combinatorial techniques to solve the Linear Distance TTP (LD-TTP) for n = 4 and n = 6, without any use of computing, generating the complete set of optimal distances regardless of where the n teams are located. We show that there are only 295 non-isomorphic schedules that can be a solution to the 6-team LD-TTP, and demonstrate that in all previously-solved benchmark TTP instances on 6 teams, the distance-optimal schedule appears in this list of 295, even when the six teams are arranged in a circle or located in three-dimensional space. We then extend the LD-TTP to multiple rounds, and apply our theory to produce a nearly-optimal regular-season schedule for the Nippon Pro Baseball league in Japan. We conclude the paper by generalizing our theory to the n -team LD-TTP, producing a feasible schedule whose total distance is guaranteed to be no worse than 4/3 times the optimal solution.
Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning
Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)
It has been shown recently that the complexity of belief tracking in deterministic conformant and contingent planning is exponential in a width parameter that is often bounded and small. In this work, we introduce a new width notion that applies to non-deterministic conformant and contingent problems as well. We also develop a belief tracking algorithm for non-deterministic problems that is exponential in the problem width, analyze the width of non-deterministic benchmarks, compare the new notion to the previous one over deterministic problems, and present experimental results.
Action Selection for MDPs: Anytime AO* Versus UCT
Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)
One of the natural approaches for selecting actions in very From this perspective, an algorithm like RTDP fails on two large state spaces is by performing a limited amount of grounds: first, RTDP does not appear to make best use of lookahead. In the contexts of discounted MDPs, Kearns, short time windows in large state spaces; second, and more Mansour, and Ng have shown that near to optimal actions importantly, RTDP can use admissible heuristics but not informed can be selected by considering a sampled lookahead tree that base policies. On the other hand, algorithms like Policy is sufficiently sparse, whose size depends on the discount Iteration (Howard 1971), deliver all of these features except factor and the suboptimality bound but not on the number of one: they are exhaustive, and thus even to get started, problem states (Kearns, Mansour, and Ng 1999). The UCT they need vectors with the size of the state space. At the algorithm (Kocsis and Szepesvári 2006) is a version of this same time, while there are non-exhaustive versions of (asynchronous) form of Monte Carlo planning, where the lookahead trees Value Iteration such as RTDP, there are no similar are not grown depth-first but'best-first', following a selection'focused' versions of Policy Iteration ensuring anytime optimality.
Exacting Social Events for Tweets Using a Factor Graph
Liu, Xiaohua (Harbin Institute of Technology) | Zhou, Xiangyang (icrosoft Research Asia) | Fu, Zhongyang (Shanghai Jiao Tong University) | Wei, Furu (Microsoft Research Asia) | Zhou, Ming (Microsoft Research Asia)
Social events are events that occur between people where at least one person is aware of the other and of the event taking place. Extracting social events can play an important role in a wide range of applications, such as the construction of social network. In this paper, we introduce the task of social event extraction for tweets, an important source of fresh events. One main challenge is the lack of information in a single tweet, which is rooted in the short and noise-prone nature of tweets. We propose to collectively extract social events from multiple similar tweets using a novel factor graph, to harvest the redundance in tweets, i.e., the repeated occurrences of a social event in several tweets. We evaluate our method on a human annotated data set, and show that it outperforms all baselines, with an absolute gain of 21% in F1.
Three Controversial Hypotheses Concerning Computation in the Primate Cortex
Dean, Thomas (Google) | Corrado, Greg S. (Google) | Shlens, Jonathon (Google)
We consider three hypotheses concerning the primate neocortex which have influenced computational neuroscience in recent years. Is the mind modular in terms of its being profitably described as a collection of relatively independent functional units? Does the regular structure of the cortex imply a single algorithm at work, operating on many different inputs in parallel? Can the cognitive differences between humans and our closest primate relatives be explained in terms of a scalable cortical architecture? We bring to bear diverse sources of evidence to argue that the answers to each of these questions — with some judicious qualifications — are in the affirmative. In particular, we argue that while our higher cognitive functions may interact in a complicated fashion, many of the component functions operate through well-defined interfaces and, perhaps more important, are built on a neural substrate that scales easily under the control of a modular genetic architecture. Processing in the primary sensory cortices seem amenable to similar algorithmic principles, and, even for those cases where alternative principles are at play, the regular structure of cortex allows the same or greater advantages as the architecture scales. Similar genetic machinery to that used by nature to scale body plans has apparently been applied to scale cortical computations. The resulting replicated computing units can be used to build larger working memory and support deeper recursions needed to qualitatively improve our abilities to handle language, abstraction and social interaction.
Supervised Probabilistic Robust Embedding with Sparse Noise
Zhang, Yu (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology) | Xing, Eric P. (Carnegie Mellon University)
Many noise models do not faithfully reflect the noise processes introduced during data collection in many real-world applications. In particular, we argue that a type of noise referred to as sparse noise is quite commonly found in many applications and many existing works have been proposed to model such sparse noise. However, all the existing works only focus on unsupervised learning without considering the supervised information, i.e., label information. In this paper, we consider how to model and handle sparse noise in the context of embedding high-dimensional data under a probabilistic formulation for supervised learning. We propose a supervised probabilistic robust embedding (SPRE) model in which data are corrupted either by sparse noise or by a combination of Gaussian and sparse noises. By using the Laplace distribution as a prior to model sparse noise, we devise a two-fold variational EM learning algorithm in which the update of model parameters has analytical solution. We report some classification experiments to compare SPRE with several related models.
Pairwise Exemplar Clustering
Yang, Yingzhen (University of Illinois at Urbana-Champaign) | Chu, Xinqi (University of Illinois at Urbana-Champaign) | Liang, Feng (University of Illinois at Urbana-Champaign) | Huang, Thomas S. (University of Illinois at Urbana-Champaign)
Exemplar-based clustering methods have been extensively shown to be effective in many clustering problems. They adaptively determine the number of clusters and hold the appealing advantage of not requiring the estimation of latent parameters, which is otherwise difficult in case of complicated parametric model and high dimensionality of the data. However, modeling arbitrary underlying distribution of the data is still difficult for existing exemplar-based clustering methods. We present Pairwise Exemplar Clustering (PEC) to alleviate this problem by modeling the underlying cluster distributions more accurately with non-parametric kernel density estimation. Interpreting the clusters as classes from a supervised learning perspective, we search for an optimal partition of the data that balances two quantities: 1 the misclassification rate of the data partition for separating the clusters; 2 the sum of within-cluster dissimilarities for controlling the cluster size. The broadly used kernel form of cut turns out to be a special case of our formulation. Moreover, we optimize the corresponding objective function by a new efficient algorithm for message computation in a pairwise MRF. Experimental results on synthetic and real data demonstrate the effectiveness of our method.
Discriminative Clustering via Generative Feature Mapping
Wang, Liwei (The Chinese University of Hong Kong) | Li, Xiong (Shanghai Jiao Tong University) | Tu, Zhuowen (Microsoft Research Asia and UCLA) | Jia, Jiaya (The Chinese University of Hong Kong)
Existing clustering methods can be roughly classified into two categories: generative and discriminative approaches. Generative clustering aims to explain the data and thus is adaptive to the underlying data distribution; discriminative clustering, on the other hand, emphasizes on finding partition boundaries. In this paper, we take the advantages of both models by coupling the two paradigms through feature mapping derived from linearizing Bayesian classifiers. Such the feature mapping strategy maps nonlinear boundaries of generative clustering to linear ones in the feature space where we explicitly impose the maximum entropy principle. We also propose the unified probabilistic framework, enabling solvers using standard techniques. Experiments on a variety of datasets bear out the notable benefit of our method in terms of adaptiveness and robustness.
Ensemble Feature Weighting Based on Local Learning and Diversity
Li, Yun (Nanjing University of Posts and Telecommunications) | Gao, Suyan (Nanjing University of Posts and Telecommunications) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics)
Recently, besides the performance, the stability (robustness, i.e., the variation in feature selection results due to small changes in the data set) of feature selection is received more attention. Ensemble feature selection where multiple feature selection outputs are combined to yield more robust results without sacrificing the performance is an effective method for stable feature selection. In order to make further improvements of the performance (classification accuracy), the diversity regularized ensemble feature weighting framework is presented, in which the base feature selector is based on local learning with logistic loss for its robustness to huge irrelevant features and small samples. At the same time, the sample complexity of the proposed ensemble feature weighting algorithm is analyzed based on the VC-theory. The experiments on different kinds of data sets show that the proposed ensemble method can achieve higher accuracy than other ensemble ones and other stable feature selection strategy (such as sample weighting) without sacrificing stability
Teaching Machines to Learn by Metaphors
Levy, Omer (Technion - Israel Institute of Technology) | Markovitch, Shaul (Technion - Israel Institute of Technology)
Humans have an uncanny ability to learn new concepts with very few examples. Cognitive theories have suggested that this is done by utilizing prior experience of related tasks. We propose to emulate this process in machines, by transforming new problems into old ones. These transformations are called metaphors. Obviously, the learner is not given a metaphor, but must acquire one through a learning process. We show that learning metaphors yield better results than existing transfer learning methods. Moreover, we argue that metaphors give a qualitative assessment of task relatedness.