Overview
A Random Forest Guided Tour
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
Vertex nomination schemes for membership prediction
Fishkind, D. E., Lyzinski, V., Pao, H., Chen, L., Priebe, C. E.
Suppose that a graph is realized from a stochastic block model where one of the blocks is of interest, but many or all of the vertices' block labels are unobserved. The task is to order the vertices with unobserved block labels into a "nomination list" such that, with high probability, vertices from the interesting block are concentrated near the list's beginning. We propose several vertex nomination schemes. Our basic--but principled--setting and development yields a best nomination scheme (which is a Bayes-Optimal analogue), and also a likelihood maximization nomination scheme that is practical to implement when there are a thousand vertices, and which is empirically near-optimal when the number of vertices is small enough to allow comparison to the best nomination scheme. We then illustrate the robustness of the likelihood maximization nomination scheme to the modeling challenges inherent in real data, using examples which include a social network involving human trafficking, the Enron Graph, a worm brain connectome and a political blog network. In a stochastic block model, the vertices of the graph are partitioned into blocks, and the existence/nonexistence of an edge between any pair of vertices is an independent Bernoulli trial, with the Bernoulli parameter being a function of the block memberships of the pair of vertices. We are concerned here with a graph realized from a stochastic block model such that many or all of the vertices' block labels are hidden (i.e., unobserved). Received August 2014; revised February 2015. Supported in part by Johns Hopkins University Human Language Technology Center of Excellence (JHU HLT COE) and the XDATA program of the Defense Advanced Research Projects Agency (DARPA) administered through Air Force Research Laboratory contract FA8750-12-2-0303.
Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization
Xu, Baohan, Fu, Yanwei, Jiang, Yu-Gang, Li, Boyang, Sigal, Leonid
Rapid development of mobile devices has led to an explosive growth of user-generated images and videos, which creates a demand for computational understanding of visual media content. In addition to recognition of objective content, such as objects and scenes, an important dimension of video content analysis is the understanding of emotional or affective content, i.e. estimating the emotional impact of the video on a viewer. Emotional content can strongly resonate with viewers and plays a crucial role in the videowatching experience. Some successes have been achieved with the use of deep-learning architectures trained for text at both sentence-and document-level [40] or image sentiment analysis [8]. However, the ability to understand emotions from video, to a large extent, remains an unsolved problem. Analysis of emotional content in video has many realworld applications. Video recommendation services can benefit from matching user interests with the emotions of video content and prediction of interestingness [20], [21], [36], leading to improved user satisfaction. Better understanding of video emotions may enable advertising that is consistent with the main video's mood and help avoid social inappropriateness such as placing a funny advertisement alongside a funeral video. Video summarization [68] and coding [60] can also benefit from understanding emotions, since an accurate summary should keep the emotional content conveyed by the original video.
The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems
Oliehoek, Frans A. (University of Liverpool, University of Amsterdam) | Spaan, Matthijs T. J. (Delft University of Technology) | Robbel, Philipp (Massachusetts Institute of Technology) | Messias, Joao (University of Amsterdam)
This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Some of its key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent systems; provides a large number of models for decision-theoretic decision making, including one-shot decision making (e.g., Bayesian games) and sequential decision making under various assumptions of observability and cooperation, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an extensive range of planning and learning algorithms for single-and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm.
Nested Value Iteration for Partially Satisfiable Co-Safe LTL Specifications (Extended Abstract)
Lacerda, Bruno (University of Birmingham) | Parker, David (University of Birmingham) | Hawes, Nick (University of Birmingham)
We describe our recent work on cost-optimal policy generation, for co-safe linear temporal logic (LTL) specifications that are not satisfiable with probability one in a Markov decision process (MDP) model. We provide an overview of the approach to pose the problem as the optimisation of three standard objectives in a trimmed product MDP. Furthermore, we introduce a new approach for optimising the three objectives, in a decreasing order of priority, based on a “nested” value iteration, where one value table is kept for each objective.
A Taxonomy for Improving Dialog between Autonomous Agent Developers and Human-Machine Interface Designers
Hooper, Daylond James (Infoscitex, Inc.) | Duffy, Jeffrey P. (Infoscitex, Inc.) | Calhoun, Gloria L. (Wright Patterson Air Force Base) | Hughes, Thomas C. (Infoscitex, Inc.)
Autonomous agents require interfaces to define their interactions with humans. The coupling between agents and humans is often limited, with disjoint goals between the agent interface and its associated autonomous components. This leads to a gap in human interaction relative to agent capabilities. We seek to aid interface designs by clarifying agent capabilities within an interface context. A taxonomy was developed that can help elucidate the agent’s affordances and constraints that guide interface design. Moreover, the descriptors employed in the taxonomy can serve as a common language to support dialog between agent and interface developers, resulting in improved autonomous systems that support human-autonomy coordination.
Multi-Level Evolution of Shooter Levels
Cachia, William (University of Malta) | Liapis, Antonios (University of Malta) | Yannakakis, Georgios N. (University of Malta)
This paper introduces a search-based generative process for first person shooter levels. Genetic algorithms evolve the level's architecture and the placement of powerups and player spawnpoints, generating levels with one floor or two floors. The evaluation of generated levels combines metrics collected from simulations of artificial agents competing in the level and theory-based heuristics targeting general level design patterns. Both simulation-based and theory-driven evaluations target player balance and exploration, while resulting levels emergently exhibit several popular design patters of shooter levels.
Targeting Horror via Level and Soundscape Generation
Lopes, Phil (University of Malta) | Liapis, Antonios (University of Malta) | Yannakakis, Georgios N. (University of Malta)
Horror games form a peculiar niche within game design paradigms, as they entertain by eliciting negative emotions such as fear and unease to their audience during play. This genre often follows a specific progression of tension culminating at a metaphorical peak, which is defined by the designer. A player's tension is elicited by several facets of the game, including its mechanics, its sounds, and the placement of enemies in its levels. This paper investigates how designers can control and guide the automated generation of levels and their soundscapes by authoring the intended tension of a player traversing them.
Learning Supervised Topic Models from Crowds
Rodrigues, Filipe (University of Coimbra) | Ribeiro, Bernardete (University of Coimbra) | Lourenço, Mariana (University of Coimbra) | Pereira, Francisco (Massachusetts Institute of Technology)
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this paper, we propose a supervised topic model that accounts for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state of the art approaches.
A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data
Peng, Xi, Tang, Huajin, Zhang, Lei, Yi, Zhang, Xiao, Shijie
Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and $\ell_2$-norm-based representation, and have achieved state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving large-scale problems. Second, they cannot cope with out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework which makes representation-based subspace clustering algorithms feasible to cluster both out-of-sample and large-scale data. Under our framework, the large-scale problem is tackled by converting it as out-of-sample problem in the manner of "sampling, clustering, coding, and classifying". Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently-proposed scalable methods in clustering large-scale data set.