Goto

Collaborating Authors

 Oceania


Predicting Peer-to-Peer Loan Rates Using Bayesian Non-Linear Regression

AAAI Conferences

Peer-to-peer lending is a new highly liquid market for debt, which is rapidly growing in popularity. Here we consider modelling market rates, developing a non-linear Gaussian Process regression method which incorporates both structured data and unstructured text from the loan application. We show that the peer-to-peer market is predictable, and identify a small set of key factors with high predictive power. Our approach outperforms baseline methods for predicting market rates, and generates substantial profit in a trading simulation.


Learning Entity and Relation Embeddings for Knowledge Graph Completion

AAAI Conferences

Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.


Large-Margin Multi-Label Causal Feature Learning

AAAI Conferences

In multi-label learning, an example is represented by a descriptive feature associated with several labels. Simply considering labels as independent or correlated is crude; it would be beneficial to define and exploit the causality between multiple labels. For example, an image label 'lake' implies the label 'water', but not vice versa. Since the original features are a disorderly mixture of the properties originating from different labels, it is intuitive to factorize these raw features to clearly represent each individual label and its causality relationship.Following the large-margin principle, we propose an effective approach to discover the causal features of multiple labels, thus revealing the causality between labels from the perspective of feature. We show theoretically that the proposed approach is a tight approximation of the empirical multi-label classification error, and the causality revealed strengthens the consistency of the algorithm. Extensive experimentations using synthetic and real-world data demonstrate that the proposed algorithm effectively discovers label causality, generates causal features, and improves multi-label learning.


Coupled Interdependent Attribute Analysis on Mixed Data

AAAI Conferences

In the real-world applications, heterogeneous interdependent attributes that consist of both discrete and numerical variables can be observed ubiquitously. The usual representation of these data sets is an information table, assuming the independence of attributes. However, very often, they are actually interdependent on one another, either explicitly or implicitly. Limited research has been conducted in analyzing such attribute interactions, which causes the analysis results to be more local than global. This paper proposes the coupled heterogeneous attribute analysis to capture the interdependence among mixed data by addressing coupling context and coupling weights in unsupervised learning. Such global couplings integrate the interactions within discrete attributes, within numerical attributes and across them to form the coupled representation for mixed type objects based on dimension conversion and feature selection. This work makes one step forward towards explicitly modeling the interdependence of heterogeneous attributes among mixed data, verified by the applications in data structure analysis, data clustering evaluation, and density comparison. Substantial experiments on 12 UCI data sets show that our approach can effectively capture the global couplings of heterogeneous attributes and outperforms the state-of-the-art methods, supported by statistical analysis.


Sub-Merge: Diving Down to the Attribute-Value Level in Statistical Schema Matching

AAAI Conferences

Matching and merging data from conflicting sources is the bread and butter of data integration, which drives search verticals, e-commerce comparison sites and cyber intelligence. Schema matching lifts data integration - traditionally focused on well-structured data - to highly heterogeneous sources. While schema matching has enjoyed significant success in matching data attributes, inconsistencies can exist at a deeper level, making full integration difficult or impossible. We propose a more fine-grained approach that focuses on correspondences between the values of attributes across data sources. Since the semantics of attribute values derive from their use and co-occurrence, we argue for the suitability of canonical correlation analysis (CCA) and its variants. We demonstrate the superior statistical and computational performance of multiple sparse CCA compared to a suite of baseline algorithms, on two datasets which we are releasing to stimulate further research. Our crowd-annotated data covers both cases that are relatively easy for humans to supply ground-truth, and that are inherently difficult for human computation.


Learning to Uncover Deep Musical Structure

AAAI Conferences

The overarching goal of music theory is to explain the inner workings of a musical composition by examining the structure of the composition. Schenkerian music theory supposes that Western tonal compositions can be viewed as hierarchies of musical objects. The process of Schenkerian analysis reveals this hierarchy by identifying connections between notes or chords of a composition that illustrate both the small- and large-scale construction of the music. We present a new probabilistic model of this variety of music analysis, details of how the parameters of the model can be learned from a corpus, an algorithm for deriving the most probable analysis for a given piece of music, and both quantitative and human-based evaluations of the algorithm's performance. This represents the first large-scale data-driven computational approach to hierarchical music analysis.


Identifying At-Risk Students in Massive Open Online Courses

AAAI Conferences

Massive Open Online Courses (MOOCs) have received widespread attention for their potential to scale higher education, with multiple platforms such as Coursera, edX and Udacity recently appearing. Despite their successes, a major problem faced by MOOCs is low completion rates. In this paper, we explore the accurate early identification of students who are at risk of not completing courses. We build predictive models weekly, over multiple offerings of a course. Furthermore, we envision student interventions that present meaningful probabilities of failure, enacted only for marginal students.To be effective, predicted probabilities must be both well-calibrated and smoothed across weeks.Based on logistic regression, we propose two transfer learning algorithms to trade-off smoothness and accuracy by adding a regularization term to minimize the difference of failure probabilities between consecutive weeks. Experimental results on two offerings of a Coursera MOOC establish the effectiveness of our algorithms.


Constructing Models of User and Task Characteristics from Eye Gaze Data for User-Adaptive Information Highlighting

AAAI Conferences

A user-adaptive information visualization system capable of learning models of users and the visualization tasks they perform could provide interventions optimized for helping specific users in specific task contexts. In this paper, we investigate the accuracy of predicting visualization tasks, user performance on tasks, and user traits from gaze data. We show that predictions made with a logistic regression model are significantly better than a baseline classifier, with particularly strong results for predicting task type and user performance. Furthermore, we compare classifiers built with interface-independent and interface-dependent features, and show that the interface-independent features are comparable or superior to interface-dependent ones. Finally, we discuss how the accuracy of predictive models is affected if they are trained with data from trials that had highlighting interventions added to the visualization.


Sample-Targeted Clinical Trial Adaptation

AAAI Conferences

Clinical trial adaptation refers to any adjustment of the trial protocol after the onset of the trial. The main goal is to make the process of introducing new medical interventions to patients more efficient by reducing the cost and the time associated with evaluating their safety and efficacy. The principal question is how should adaptation be performed so as to minimize the chance of distorting the outcome of the trial. We propose a novel method for achieving this. Unlike previous work our approach focuses on trial adaptation by sample size adjustment. We adopt a recently proposed stratification framework based on collected auxiliary data and show that this information together with the primary measured variables can be used to make a probabilistically informed choice of the particular sub-group a sample should be removed from. Experiments on simulated data are used to illustrate the effectiveness of our method and its application in practice.


Variational Inference for Nonparametric Bayesian Quantile Regression

AAAI Conferences

Quantile regression deals with the problem of computing robust estimators when the conditional mean and standard deviation of the predicted function are inadequate to capture its variability. The technique has an extensive list of applications, including health sciences, ecology and finance. In this work we present a non-parametric method of inferring quantiles and derive a novel Variational Bayesian (VB) approximation to the marginal likelihood, leading to an elegant Expectation Maximisation algorithm for learning the model. Our method is nonparametric, has strong convergence guarantees, and can deal with nonsymmetric quantiles seamlessly. We compare the method to other parametric and non-parametric Bayesian techniques, and alternative approximations based on expectation propagation demonstrating the benefits of our framework in toy problems and real datasets.