Goto

Collaborating Authors

 Statistical Learning


Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm

arXiv.org Machine Learning

One of the most fundamental concepts in statistics is the concept of sample mean. Properties of the sample mean that are well-defined in Euclidean spaces become unwieldy or even unclear in graph spaces. Open problems related to the sample mean of graphs include: non-existence, non-uniqueness, statistical inconsistency, lack of convergence results of mean algorithms, non-existence of midpoints, and disparity to midpoints. We present conditions to resolve all six problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on graph datasets representing images and molecules show that the MMM-Algorithm best approximates a sample mean of graphs compared to six other mean algorithms.


PCA-Based Out-of-Sample Extension for Dimensionality Reduction

arXiv.org Machine Learning

Dimensionality reduction methods are very common in the field of high dimensional data analysis. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications for the analysis of massive amounts of data are impractical. For example, repeated computations due to accumulated data are computationally prohibitive. In this paper, an out-of-sample extension scheme, which is used as a complementary method for dimensionality reduction, is presented. We describe an algorithm which performs an out-of-sample extension to newly-arrived data points. Unlike other extension algorithms such as Nystr\"om algorithm, the proposed algorithm uses the intrinsic geometry of the data and properties for dimensionality reduction map. We prove that the error of the proposed algorithm is bounded. Additionally to the out-of-sample extension, the algorithm provides a degree of the abnormality of any newly-arrived data point.


Convolutional Networks on Graphs for Learning Molecular Fingerprints

arXiv.org Machine Learning

We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks.


From random walks to distances on unweighted graphs

arXiv.org Machine Learning

Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.


Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

arXiv.org Machine Learning

This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time $t_{\text{mix}}$ of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least $\Omega(t_{\text{relax}})$ times on the average. Finally, future directions of research are identified.


Optimizing Playersโ€™ Expected Enjoyment in Interactive Stories

AAAI Conferences

In interactive storytelling systems and other story-based computer games, a drama manager is a background agent that aims to bring about an enjoyable and coherent experience for the players. In this paper, we present a personalized drama manager that increases a player's expected enjoyment without removing player agency. Our personalized drama manager models a player's preference using data-driven techniques, predicts the probability the player transitioning to different story experiences, selects an objective experience that can maximize the player's expected enjoyment, and guides the player to the selected story experience. Human study results show that our drama manager can significantly increase players' enjoyment ratings in an interactive storytelling testbed, compared to drama managers in previous research.


Modeling Individual Differences through Frequent Pattern Mining on Role-Playing Game Actions

AAAI Conferences

There has been much work on player modeling using game behavioral data collected. Many of the previous research projects that targeted this goal used aggregate game statistics as features to develop behavior models using both statistical and machine learning techniques. While existing methods have already led to interesting findings, we suspect that aggregated features discard valuable information such as temporal or sequential patterns, which may be important in deciphering information about decisionmaking, problem solving, or individual differences. Such sequential information is critical to analyze player behaviors especially in role-playing games (RPG) where players can face ample choices, experience different contexts, behave freely with individual propensities but possibly end up with similar aggregated statistics (e.g., levels, time spent). In this paper we intend to develop and apply a modeling technique that takes into consideration sequential patters to decipher individual differences in playing a Role Playing Game (RPG) game. Using an RPG with multiple affordances, we designed an experiment collecting granular in-game behaviors of 64 players. Using closed sequential pattern mining and logistic regression, we developed a model that uses gameplay action sequences to predict the real world characteristics, including gender, game play expertise and five personality traits (as defined by psychology). The results show that game expertise is a dominant factor that impacts in-game behaviors. The contribution of this paper is the algorithms we developed combined with a validation procedure to determine the reliability and validity of the results and the results themselves.


A Data-Driven Approach for Computationally Modeling Players' Avatar Customization Behaviors

AAAI Conferences

Avatar customization systems enable players to represent themselves virtually in many ways. Research has shown that players exhibit different preferences and motivations in how they customize their avatars. In this paper, we present a data-driven analytical approach to modeling player behavioral patterns exhibited during the avatar customization process.ย We used our data mining tool \textit{AIRvatar} to analyze telemetry data obtained from 190 players using an avatar creator of our own design. Using non-negative matrix factorization (NMF) and N-gram models, we demonstrate how our approach computationally models behavioral patterns exhibited by players such as "regular shopping," "engaged shopping," or "bored browsing". Our models obtained significant effect sizes (0.12 <= R^2 <= 0.54) when validated with multiple linear regressions for players' time spent engaging in activities within the avatar creator. The NMF model had comparably high performance and ease of interpretation compared to control models.


Combining Crowd and Expert Labels Using Decision Theoretic Active Learning

AAAI Conferences

We consider a finite-pool data categorization scenario which requires exhaustively classifying a given set of examples with a limited budget. We adopt a hybrid human-machine approach which blends automatic machine learning with human labeling across a tiered workforce composed of domain experts and crowd workers. To effectively achieve high-accuracy labels over the instances in the pool at minimal cost, we develop a novel approach based on decision-theoretic active learning. On the important task of biomedical citation screening for systematic reviews, results on real data show that our method achieves consistent improvements over baseline strategies. To foster further research by others, we have made our data available online.


Comparing Clustering Approaches for Modeling Players' Values through Avatar Construction

AAAI Conferences

Videogame avatars provide an expressive avenue for players to represent themselves virtually. Research has shown that these avatars, while virtual, can reveal aspects of players' identities, along with physical, social, and cultural values of the real-world. In this paper, we present an approach for modeling player values through their avatars using artificial intelligence (AI) clustering techniques. In a study with 191 participants who created avatars using our system, we provide a thorough comparison of the techniques across numerical, textual, and visual data. Our findings showed that these data structures can effectively reveal players' values and preferences, such as conforming to stereotypes of character roles using statistical attributes, modeling nuances in text descriptions of avatars, and identifying "best-example" (prototypical) avatar appearances that players can be quantitatively shown to conform to. Our findings suggest that AI clustering approaches can be used to model players to yield insight into implicitly held values in a data-driven manner through virtual avatars.