Goto

Collaborating Authors

 Asia


Preference Planning for Markov Decision Processes

AAAI Conferences

The classical planning problem can be enriched with quantitative and qualitative user-defined preferences on how the system behaves on achieving the goal. In this paper, we propose the probabilistic preference planning problem for Markov decision processes, where the preferences are based on an enriched probabilistic LTL-style logic. We develop P4Solver, an SMT-based planner computing the preferred plan by reducing the problem to quadratic programming problem, which can be solved using SMT solvers such as Z3. We illustrate the framework by applying our approach on two selected case studies.


OMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation

AAAI Conferences

If we know most of Smith’s friends are from Boston, what can we say about the rest of Smith’s friends? In this paper, we focus on the node classification problem on networks, which is one of the most important topics in AI and Web communities. Our proposed algorithm which is referred to as OMNIProp has the following properties: (a) seamless and accurate; it works well on any label correlations (i.e., homophily, heterophily, and mixture of them) (b) fast; it is efficient and guaranteed to converge on arbitrary graphs (c) quasi-parameter free; it has just one well-interpretable parameter with heuristic default value of 1. We also prove the theoretical connections of our algorithm to the semi-supervised learning (SSL) algorithms and to random-walks. Experiments on four real, different network datasets demonstrate the benefits of the proposed algorithm, where OMNI-Prop outperforms the top competitors.


Nystrom Approximation for Sparse Kernel Methods: Theoretical Analysis and Empirical Evaluation

AAAI Conferences

While if kernels are not Kernel methods (Schölkopf and Smola 2002; Xu et al. 2009) low rank, Nyström approximations can usually lead to suboptimal have received a lot of attention in recent studies of machine performances. To alleviate the strong assumption in learning. These methods project data into high-dimensional the seeking of the approximation bounds, we take a more or even infinite-dimensional spaces via kernel mapping general assumption that the design matrix K ensuring the restricted functions. Despite the strong generalization ability induced isometric property (Koltchinskii 2011). In particular, by kernel methods, they usually suffer from the high computation the new assumption obeys the restricted eigenvalue condition complexity of calculating the kernel matrix (also (Koltchinskii 2011; Bickel, Ritov, and Tsybakov 2009), called Gram matrix). Although low-rank decomposition which has been shown to be more general than several techniques(e.g., Cholesky Decomposition (Fine and Scheinberg other similar assumptions used in sparsity literature (Candes 2002; Bach and Jordan 2005)), and truncating methods(e.g., and Tao 2007; Donoho, Elad, and Temlyakov 2006; Kernel Tapering (Shen, Xu, and Allebach 2014; Zhang and Huang 2008). Based on the restricted eigenvalue Furrer, Genton, and Nychka 2006)) can accelerate the calculation condition, we have provided error bounds for kernel approximation of the kernel matrix, they still need to compute the and recovery rate in sparse kernel regression.


Active Manifold Learning via Gershgorin Circle Guided Sample Selection

AAAI Conferences

In this paper, we propose an interpretation of active learning from a pure algebraic view and combine it with semi-supervised manifold learning. The proposed active manifold learning algorithm aims to learn the low-dimensional parameter space of the manifold with high accuracy from smartly labeled samples. We demonstrate that this problem is equivalent to a condition number minimization problem of the alignment matrix. Focusing on this problem, we first give a theoretical upper bound for the solution. Then we develop a heuristic but effective sample selection algorithm with the help of the Gershgorin circle theorem. We investigate the rationality, the feasibility, the universality and the complexity of the proposed method and demonstrate that our method yields encouraging active learning results.


Bayesian Model Averaging Naive Bayes (BMA-NB): Averaging over an Exponential Number of Feature Models in Linear Time

AAAI Conferences

Naive Bayes (NB) is well-known to be a simple but effective classifier, especially when combined with feature selection. Unfortunately, feature selection methods are often greedy and thus cannot guarantee an optimal feature set is selected. An alternative to feature selection is to use Bayesian model averaging (BMA), which computes a weighted average over multiple predictors; when the different predictor models correspond to different feature sets, BMA has the advantage over feature selection that its predictions tend to have lower variance on average in comparison to any single model. In this paper, we show for the first time that it is possible to exactly evaluate BMA over the exponentially-sized powerset of NB feature models in linear-time in the number of features; this yields an algorithm about as expensive to train as a single NB model with all features, but yet provably converges to the globally optimal feature subset in the asymptotic limit of data. We evaluate this novel BMA-NB classifier on a range of datasets showing that it never underperforms NB (as expected) and sometimes offers performance competitive (or superior) to classifiers such as SVMs and logistic regression while taking a fraction of the time to train.


Relational Stacked Denoising Autoencoder for Tag Recommendation

AAAI Conferences

Tag recommendation has become one of the most important ways of organizing and indexing online resources like articles, movies, and music. Since tagging information is usually very sparse, effective learning of the content representation for these resources is crucial to accurate tag recommendation. Recently, models proposed for tag recommendation, such as collaborative topic regression and its variants, have demonstrated promising accuracy. However, a limitation of these models is that, by using topic models like latent Dirichlet allocation as the key component, the learned representation may not be compact and effective enough. Moreover, since relational data exist as an auxiliary data source in many applications, it is desirable to incorporate such data into tag recommendation models. In this paper, we start with a deep learning model called stacked denoising autoencoder (SDAE) in an attempt to learn more effective content representation. We propose a probabilistic formulation for SDAE and then extend it to a relational SDAE (RSDAE) model. RSDAE jointly performs deep representation learning and relational learning in a principled way under a probabilistic framework. Experiments conducted on three real datasets show that both learning more effective representation and learning from relational data are beneficial steps to take to advance the state of the art.


Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery

AAAI Conferences

Diseases such as autism, cardiovascular disease, and the autoimmune disorders are difficult to treat because of the remarkable degree of variation among affected individuals. Subtyping research seeks to refine the definition of such complex, multi-organ diseases by identifying homogeneous patient subgroups. In this paper, we propose the Probabilistic Subtyping Model (PSM) to identify subgroups based on clustering individual clinical severity markers. This task is challenging due to the presence of nuisance variability — variations in measurements that are not due to disease subtype — which, if not accounted for, generate biased estimates for the group-level trajectories. Measurement sparsity and irregular sampling patterns pose additional challenges in clustering such data. PSM uses a hierarchical model to account for these different sources of variability. Our experiments demonstrate that by accounting for nuisance variability, PSM is able to more accurately model the marker data. We also discuss novel subtypes discovered using PSM and the resulting clinical hypotheses that are now the subject of follow up clinical experiments.


Outlier-Robust Convex Segmentation

AAAI Conferences

We derive a convex optimization problem for the task of segmenting sequential data, which explicitly treats presence of outliers. We describe two algorithms for solving this problem, one exact and one a top-down novel approach, and we derive a consistency results for the case of two segments and no outliers. Robustness to outliers is evaluated on two real-world tasks related to speech segmentation. Our algorithms outperform baseline segmentation algorithms.


Spectral Clustering Using Multilinear SVD: Analysis, Approximations and Applications

AAAI Conferences

Spectral clustering, a graph partitioning technique, has gained immense popularity in machine learning in the context of unsupervised learning. This is due to convincing empirical studies, elegant approaches involved and the theoretical guarantees provided in the literature. To tackle some challenging problems that arose in computer vision etc., recently, a need to develop spectral methods that incorporate multi-way similarity measures surfaced. This, in turn, leads to a hypergraph partitioning problem. In this paper, we formulate a criterion for partitioning uniform hypergraphs, and show that a relaxation of this problem is related to the multilinear singular value decomposition (SVD) of symmetric tensors. Using this, we provide a spectral technique for clustering based on higher order affinities, and derive a theoretical bound on the error incurred by this method. We also study the complexity of the algorithm and use Nystr ̈om’s method and column sampling techniques to develop approximate methods with significantly reduced complexity. Experiments on geometric grouping and motion segmentation demonstrate the practical significance of the proposed methods.


Optimizing Bag Features for Multiple-Instance Retrieval

AAAI Conferences

Multiple-Instance (MI) learning is an important supervised learning technique which deals with collections of instances called bags. While existing research in MI learning mainly focused on classification, in this paper we propose a new approach for MI retrieval to enable effective similarity retrieval of bags of instances, where training data is presented in the form of similar and dissimilar bag pairs. An embedded scheme is devised as encoding each bag into a single bag feature vector by exploiting a similarity-based transformation. In this way, the original MI problem is converted into a single-instance version. Furthermore, we develop a principled approach for optimizing bag features specific to similarity retrieval through leveraging pairwise label information at the bag level. The experimental results demonstrate the effectiveness of the proposed approach in comparison with the alternatives for MI retrieval.