South America
The Multi-Agent Programming Contest
Behrens, Tristan (Clausthal University of Technology) | Dastani, Mehdi (Utrecht University) | Dix, Jürgen (Clausthal University of Technology) | Hübner, Jomi (University of Santa Catarina) | Köster, Michael (Clausthal University of Technology) | Novák, Peter (Delft University of Technology) | Schlesinger, Federico (Clausthal University of Technology)
It has since been organized by the AI group at Clausthal University of Technology. MAPC is not collocated with any other event. Using our MASSim platform, the participants are running their own systems locally and only interact with the tournament server over the Internet. A steering committee oversees the whole process and determines the organization committee. The scenario changes every other year: the current one is "Agents on Mars."
A Conditional Multinomial Mixture Model for Superset Label Learning
Liu, Liping, Dietterich, Thomas G.
In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic Stick-Breaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSB-CMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution.
A Conditional Multinomial Mixture Model for Superset Label Learning
Liu, Liping, Dietterich, Thomas G.
In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic Stick-Breaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSB-CMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution.
Gradient Weights help Nonparametric Regressors
Kpotufe, Samory, Boularias, Abdeslam
In regression problems over $\real^d$, the unknown function $f$ often varies more in some coordinates than in others. We show that weighting each coordinate $i$ with the estimated norm of the $i$th derivative of $f$ is an efficient way to significantly improve the performance of distance-based regressors, e.g. kernel and $k$-NN regressors. We propose a simple estimator of these derivative norms and prove its consistency. Moreover, the proposed estimator is efficiently learned online.
A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
The CUR matrix decomposition is an important extension of Nyström approximation to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter theoretical bound and lower time complexity, and that it can avoid maintaining the whole data matrix in main memory. Finally, experiments on several real-world datasets demonstrate significant improvement over the existing relative-error algorithms.
Local Supervised Learning through Space Partitioning
Wang, Joseph, Saligrama, Venkatesh
We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.
Learning to Predict from Textual Data
Radinsky, K., Davidovich, S., Markovitch, S.
Given a current news event, we tackle the problem of generating plausible predictions of future events it might cause. We present a new methodology for modeling and predicting such future news events using machine learning and data mining techniques. Our Pundit algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain precisely labeled causality examples, we mine 150 years of news articles and apply semantic natural language modeling techniques to headlines containing certain predefined causality patterns. For generalization, the model uses a vast number of world knowledge ontologies. Empirical evaluation on real news articles shows that our Pundit algorithm performs as well as non-expert humans.
Discovering Health Beliefs in Twitter
Bhattacharya, Sanmitra (The University of Iowa) | Tran, Hung (The University of Iowa) | Srinivasan, Padmini (The University of Iowa)
Social networking websites such as Twitter have invigorated a wide range of studies in recent years ranging from consumer opinions on products to tracking the spread of diseases. While sentiment analysis and opinion mining from tweets have been studied extensively, surveillance of beliefs, especially those related to public health, have received considerably less attention. In our previous work, we proposed a model for surveillance of health beliefs on Twitter relying on the use of hand-picked probe statements expressing various health-related propositions. In this work we extend our model to automatically discover various probes related to public health beliefs. We present a data driven approach based on two distinct datasets and study the prevalence of public belief, disbelief or doubt for newly discovered probe statements.
Apoptotic Stigmergic Agents for Real-Time Swarming Simulation
Parunak, H. Van Dyke (Jacobs Technology Group) | Brooks, S. Hugh (enkidu7) | Brueckner, Sven A. (Jacobs Technology Group) | Gupta, Ravi (enkidu7)
One common use for swarming agents is in social simulation. This paper reports on such a model developed to track protest activities at the May 2012 NATO summit in Chicago. The use of apoptotic stigmergic agents allows the model to run on-line, consuming two kinds of external data and reporting its results in real time.
Generating Approximate Solutions to the TTP using a Linear Distance Relaxation
Hoshino, R., Kawarabayashi, K.
In some domestic professional sports leagues, the home stadiums are located in cities connected by a common train line running in one direction. For these instances, we can incorporate this geographical information to determine optimal or nearly-optimal solutions to the n-team Traveling Tournament Problem (TTP), an NP-hard sports scheduling problem whose solution is a double round-robin tournament schedule that minimizes the sum total of distances traveled by all n teams. We introduce the Linear Distance Traveling Tournament Problem (LD-TTP), and solve it for n=4 and n=6, generating the complete set of possible solutions through elementary combinatorial techniques. For larger n, we propose a novel "expander construction" that generates an approximate solution to the LD-TTP. For n congruent to 4 modulo 6, we show that our expander construction produces a feasible double round-robin tournament schedule whose total distance is guaranteed to be no worse than 4/3 times the optimal solution, regardless of where the n teams are located. This 4/3-approximation for the LD-TTP is stronger than the currently best-known ratio of 5/3 + epsilon for the general TTP. We conclude the paper by applying this linear distance relaxation to general (non-linear) n-team TTP instances, where we develop fast approximate solutions by simply "assuming" the n teams lie on a straight line and solving the modified problem. We show that this technique surprisingly generates the distance-optimal tournament on all benchmark sets on 6 teams, as well as close-to-optimal schedules for larger n, even when the teams are located around a circle or positioned in three-dimensional space.