Not enough data to create a plot.
Try a different view from the menu above.
Chan, Jeffrey
A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks
Chan, Jeffrey, Perrone, Valerio, Spence, Jeffrey P., Jenkins, Paul A., Mathieson, Sara, Song, Yun S.
Inference for population genetics models is hindered by computationally intractable likelihoods. While this issue is tackled by likelihood-free methods, these approaches typically rely on hand-crafted summary statistics of the data. In complex settings, designing and selecting suitable summary statistics is problematic and results are very sensitive to such choices. In this paper, we learn the first exchangeable feature representation for population genetic data to work directly with genotype data. This is achieved by means of a novel Bayesian likelihood-free inference framework, where a permutation-invariant convolutional neural network learns the inverse functional relationship from the data to the posterior. We leverage access to scientific simulators to learn such likelihood-free function mappings, and establish a general framework for inference in a variety of simulation-based tasks. We demonstrate the power of our method on the recombination hotspot testing problem, outperforming the state-of-the-art.
Ground Truth Bias in External Cluster Validity Indices
Lei, Yang, Bezdek, James C., Romano, Simone, Vinh, Nguyen Xuan, Chan, Jeffrey, Bailey, James
It has been noticed that some external CVIs exhibit a preferential bias towards a larger or smaller number of clusters which is monotonic (directly or inversely) in the number of clusters in candidate partitions. This type of bias is caused by the functional form of the CVI model. For example, the popular Rand index (RI) exhibits a monotone increasing (NCinc) bias, while the Jaccard Index (JI) index suffers from a monotone decreasing (NCdec) bias. This type of bias has been previously recognized in the literature. In this work, we identify a new type of bias arising from the distribution of the ground truth (reference) partition against which candidate partitions are compared. We call this new type of bias ground truth (GT) bias. This type of bias occurs if a change in the reference partition causes a change in the bias status (e.g., NCinc, NCdec) of a CVI. For example, NCinc bias in the RI can be changed to NCdec bias by skewing the distribution of clusters in the ground truth partition. It is important for users to be aware of this new type of biased behaviour, since it may affect the interpretations of CVI results. The objective of this article is to study the empirical and theoretical implications of GT bias. To the best of our knowledge, this is the first extensive study of such a property for external cluster validity indices.
MOOCs Meet Measurement Theory: A Topic-Modelling Approach
He, Jiazhen (The University of Melbourne) | Rubinstein, Benjamin I. P. (The University of Melbourne) | Bailey, James (The University of Melbourne) | Zhang, Rui (The University of Melbourne) | Milligan, Sandra (The University of Melbourne) | Chan, Jeffrey (RMIT University)
This paper adapts topic models to the psychometric testing of MOOC students based on their online forum postings. Measurement theory from education and psychology provides statistical models for quantifying a person's attainment of intangible attributes such as attitudes, abilities or intelligence. Such models infer latent skill levels by relating them to individuals' observed responses on a series of items such as quiz questions. The set of items can be used to measure a latent skill if individuals' responses on them conform to a Guttman scale. Such well-scaled items differentiate between individuals and inferred levels span the entire range from most basic to the advanced. In practice, education researchers manually devise items (quiz questions) while optimising well-scaled conformance. Due to the costly nature and expert requirements of this process, psychometric testing has found limited use in everyday teaching. We aim to develop usable measurement models for highly-instrumented MOOC delivery platforms, by using participation in automatically-extracted online forum topics as items. The challenge is to formalise the Guttman scale educational constraint and incorporate it into topic models. To favour topics that automatically conform to a Guttman scale, we introduce a novel regularisation into non-negative matrix factorisation-based topic modelling. We demonstrate the suitability of our approach with both quantitative experiments on three Coursera MOOCs, and with a qualitative survey of topic interpretability on two MOOCs by domain expert interviews.
Personalized Tour Recommendation Based on User Interests and Points of Interest Visit Durations
Lim, Kwan Hui (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Leckie, Christopher (The University of Melbourne) | Karunasekera, Shanika (The University of Melbourne)
Tour recommendation and itinerary planning are challenging tasks for tourists, due to their need to select Points of Interest (POI) to visit in unfamiliar cities, and to select POIs that align with their interest preferences and trip constraints. We propose an algorithm called PersTour for recommending personalized tours using POI popularity and user interest preferences, which are automatically derived from real-life travel sequences based on geo-tagged photos. Our tour recommendation problem is modelled using a formulation of the Orienteering problem, and considers user trip constraints such as time limits and the need to start and end at specific POIs. In our work, we also reflect levels of user interest based on visit durations, and demonstrate how POI visit duration can be personalized using this time-based user interest. Using a Flickr dataset of four cities, our experiments show the effectiveness of PersTour against various baselines, in terms of tour popularity, interest, recall, precision and F1-score. In particular, our results show the merits of using time-based user interest and personalized POI visit durations, compared to the current practice of using frequency-based user interest and average visit durations.
Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View
Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)
Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.
Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View
Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)
Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.
Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View
Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)
Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.
Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View
Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)
Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.
Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View
Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)
Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.
Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View
Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)
Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.