Industry
Using Closed Captions as Supervision for Video Activity Recognition
Gupta, Sonal (Stanford University) | Mooney, Raymond J. (University of Texas at Austin)
Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, and rapid camera movements. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting "labeled" data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Amato, Christopher (University of Massachusetts, Amherst) | Bonet, Blai (Universidad Simรณn Bolรญvar) | Zilberstein, Shlomo (University of Massachusetts, Amherst)
Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones.
CAO: A Fully Automatic Emoticon Analysis System
Ptaszynski, Michal (Hokkaido University) | Maciejewski, Jacek (Hokkaido University) | Dybala, Pawel (Hokkaido University) | Rzepka, Rafal (Hokkaido University) | Araki, Kenji (Hokkaido University)
This paper presents CAO, a system for affect analysis of emoticons. Emoticons are strings of symbols widely used in text-based online communication to convey emotions. It extracts emoticons from input and determines specific emotions they express. Firstly, by matching the extracted emoticons to a raw emoticon database, containing over ten thousand emoticon samples extracted from the Web and annotated automatically. The emoticons for which emotion types could not be determined using only this database, are automatically divided into semantic areas representing "mouths" or "eyes," based on the theory of kinesics. The areas are automatically annotated according to their co-occurrence in the database. The annotation is firstly based on the eye-mouth-eye triplet, and if no such triplet is found, all semantic areas are estimated separately. This provides the system coverage exceeding 3 million possibilities. The evaluation, performed on both training and test sets, confirmed the system's capability to sufficiently detect and extract any emoticon, analyze its semantic structure and estimate the potential emotion types expressed. The system achieved nearly ideal scores, outperforming existing emoticon analysis systems.
What Is an Opinion About? Exploring Political Standpoints Using Opinion Scoring Model
Chen, Bi (Pennsylvania State University) | Zhu, Leilei (Pennsylvania State University) | Kifer, Daniel (Pennsylvania State University) | Lee, Dongwon (Pennsylvania State University)
In this paper, we propose a generative model to automatically discover the hidden associations between topics words and opinion words. By applying those discovered hidden associations, we construct the opinion scoring models to extract statements which best express opinionistsโ standpoints on certain topics. For experiments, we apply our model to the political area. First, we visualize the similarities and dissimilarities between Republican and Democratic senators with respect to various topics. Second, we compare the performance of the opinion scoring models with 14 kinds of methods to find the best ones. We find that sentences extracted by our opinion scoring models can effectively express opinionistsโ standpoints.
A Temporal Proof System for General Game Playing
Thielscher, Michael (The University of New South Wales) | Voigt, Sebastian (Dresden University of Technology)
A general game player is a system that understands the rules of unknown games and learns to play these games well without human intervention. A major challenge for research in General Game Playing is to endow a player with the ability to extract and prove game-specific knowledge from the mere game rules. We define a formal language to express temporally extended โ yet local โ properties of games. We also develop a provably correct proof theory for this language using the paradigm of Answer Set Programming, and we report on experiments with a practical implementation of this proof system in combination with a successful general game player.
Generalized Task Markets for Human and Machine Computation
Shahaf, Dafna (Carnegie Mellon University) | Horvitz, Eric (Microsoft Research)
We discuss challenges and opportunities for developing generalized task markets where human and machine intelligence are enlisted to solve problems, based on a consideration of the competencies, availabilities, and pricing of different problem-solving resources. The approach couples human computation with machine learning and planning, and is aimed at optimizing the flow of subtasks to people and to computational problem solvers. We illustrate key ideas in the context of Lingua Mechanica, a project focused on harnessing human and machine translation skills to perform translation among languages. We present infrastructure and methods for enlisting and guiding human and machine computation for language translation, including details about the hardness of generating plans for assigning tasks to solvers. Finally, we discuss studies performed with machine and human solvers, focusing on components of a Lingua Mechanica prototype.
Symmetry Detection in General Game Playing
Schiffel, Stephan (Dresden University of Technology)
We develop a method for detecting symmetries in arbitrary games and exploiting these symmetries when using tree search to play the game. Games in the General Game Playing domain are given as a set of logic based rules defining legal moves, their effects and goals of the players. The presented method transforms the rules of a game into a vertex-labeled graph such that automorphisms of the graph correspond with symmetries of the game. The algorithm detects many kinds of symmetries that often occur in games, e.g., rotation and reflection symmetries of boards, interchangeable objects, and symmetric roles. A transposition table is used to efficiently exploit the symmetries in many games.
A Computational Model for Saliency Maps by Using Local Entropy
Lin, Yuewei (Chongqing University) | Fang, Bin (Chongqing University) | Tang, Yuanyan (Chongqing University)
This paper presents a computational framework for saliency maps. It employs the Earth Mover's Distance based on weighted-Histogram (EMD-wH) to measure the center-surround difference, instead of the Difference-of-Gaussian (DoG) filter used by traditional models. In addition, the model employs not only the traditional features such as colors, intensity and orientation but also the local entropy which expresses the local complexity. The major advantage of combining the local entropy map is that it can detect the salient regions which are not complex regions. Also, it uses a general framework to integrate the feature dimensions instead of summing the features directly. This model considers both local and global salient information, in contrast to the existing models that consider only one or the other. Furthermore, the "large scale bias" and "central bias" hypotheses are used in this model to select the fixation locations in the saliency map of different scales. The performance of this model is assessed by comparing their saliency maps and human fixation density. The results from this model are finally compared to those from other bottom-up models for reference.
User-Specific Learning for Recognizing a Singer's Intended Pitch
Guillory, Andrew (University of Washington) | Basu, Sumit (Microsoft Research) | Morris, Dan (Microsoft Research)
We consider the problem of automatic vocal melody transcription: translating an audio recording of a sung melody into a musical score. While previous work has focused on finding the closest notes to the singer's tracked pitch, we instead seek to recover the melody the singer intended to sing. Often, the melody a singer intended to sing differs from what they actually sang; our hypothesis is that this occurs in a singer-specific way. For example, a given singer may often be flat in certain parts of her range, or another may have difficulty with certain intervals. We thus pursue methods for singer-specific training which use learning to combine different methods for pitch prediction. In our experiments with human subjects, we show that via a short training procedure we can learn a singer-specific pitch predictor and significantly improve transcription of intended pitch over other methods. For an average user, our method gives a 20 to 30 percent reduction in pitch classification errors with respect to a baseline method which is comparable to commercial voice transcription tools. For some users, we achieve even more dramatic reductions. Our best results come from a combination of singer-specific-learning with non-singer-specific feature selection. We also discuss the implications of our work for training more general control signals. We make our experimental data available to allow others to replicate or extend our results.
Tolerable Manipulability in Dynamic Assignment without Money
Zou, James (Harvard University) | Gujar, Sujit (Indian Institute of Science) | Parkes, David (Harvard University)
We study a problem of dynamic allocation without money. Agents have arrivals and departures and strict preferences over items. Strategyproofness requires the use of an arrival-priority serial-dictatorship (APSD) mechanism, which is ex post Pareto efficient but has poor ex ante efficiency as measured through average rank efficiency. We introduce the scoring-rule (SR) mechanism, which biases in favor of allocating items that an agent values above the population consensus. The SR mechanism is not strategyproof but has tolerable manipulability in the sense that: (i) if every agent optimally manipulates, it reduces to APSD, and (ii) it significantly outperforms APSD for rank efficiency when only a fraction of agents are strategic. The performance of SR is also robust to mistakes by agents that manipulate on the basis of inaccurate information about the popularity of items.