AITopics | Shen, Xiaobo

Collaborating Authors

Shen, Xiaobo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods

Liu, Haitao, Ong, Yew-Soon, Yu, Ziwei, Cai, Jianfei, Shen, Xiaobo

arXiv.org Machine LearningSep-14-2019

--Gaussian process classification (GPC) provides a flexible and powerful statistical framework describing joint distributions over function space. Conventional GPCs however suffer from (i) poor scalability for big data due to the full kernel matrix, and (ii) intractable inference due to the non-Gaussian likelihoods. Hence, various scalable GPCs have been proposed through (i) the sparse approximation built upon a small inducing set to reduce the time complexity; and (ii) the approximate inference to derive analytical evidence lower bound (ELBO). However, these scalable GPCs equipped with analytical ELBO are limited to specific likelihoods or additional assumptions. In this work, we present a unifying framework which accommodates scalable GPCs using various likelihoods. Analogous to GP regression (GPR), we introduce additive noises to augment the probability space for (i) the GPCs with step, (multinomial) probit and logit likelihoods via the internal variables; and particularly, (ii) the GPC using softmax likelihood via the noise variables themselves. This leads to unified scalable GPCs with analytical ELBO by using variational inference. S a nonparametric Bayesian model which is explainable and provides confidence in predictions, Gaussian process (GP) has been widely investigated and used in various scenarios, e.g., regression and classification [1], active learning [2], unsupervised learning [3], and multi-task learning [4], [5]. The central task in GP is to infer the latent function f, which follows a Gaussian process GP (0,k (.)) where the kernel k ( .) It is more challenging than the GP regression (GPR).

artificial intelligence, bayesian inference, likelihood, (17 more...)

arXiv.org Machine Learning

1909.06541

Country:

Asia > Singapore (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Transportation (0.46)
Consumer Products & Services > Travel (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

A Survey on Multi-output Learning

Xu, Donna, Shi, Yaxin, Tsang, Ivor W., Ong, Yew-Soon, Gong, Chen, Shen, Xiaobo

arXiv.org Machine LearningJan-1-2019

Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the volume, velocity, variety and veracity of the outputs. Increasing number of works in the literature have been devoted to the study of multi-output learning and the development of novel approaches for addressing the challenges encountered. However, it lacks a comprehensive overview on different types of challenges of multi-output learning brought by the characteristics of the multiple outputs and the techniques proposed to overcome the challenges. This paper thus attempts to fill in this gap to provide a comprehensive review on this area. We first introduce different stages of the life cycle of the output labels. Then we present the paradigm on multi-output learning, including its myriads of output structures, definitions of its different sub-problems, model evaluation metrics and popular data repositories used in the study. Subsequently, we review a number of state-of-the-art multi-output learning methods, which are categorized based on the challenges.

learning, neural network, survey article, (27 more...)

arXiv.org Machine Learning

1901.00248

Country: North America > Canada > Quebec (0.14)

Genre: Overview (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (1.00)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(10 more...)

Add feedback

When Gaussian Process Meets Big Data: A Review of Scalable GPs

Liu, Haitao, Ong, Yew-Soon, Shen, Xiaobo, Cai, Jianfei

arXiv.org Machine LearningJul-3-2018

The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process (GP), a well-known non-parametric and interpretable Bayesian model, which suffers from cubic complexity to training size. To improve the scalability while retaining the desirable prediction quality, a variety of scalable GPs have been presented. But they have not yet been comprehensively reviewed and discussed in a unifying way in order to be well understood by both academia and industry. To this end, this paper devotes to reviewing state-of-the-art scalable GPs involving two main categories: global approximations which distillate the entire data and local approximations which divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations which modify the prior but perform exact inference, and posterior approximations which retain exact prior but perform approximate inference; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and model capability of scalable GPs are reviewed. Finally, the extensions and open issues regarding the implementation of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.

approximation, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1807.01065

Genre:

Overview (1.00)
Research Report (0.84)

Industry:

Energy > Oil & Gas (0.34)
Information Technology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
(3 more...)

Add feedback

Compact Multi-Label Learning

Shen, Xiaobo (Nanyang Technological University) | Liu, Weiwei (The University of New South Wales) | Tsang, Ivor W. (University of Technology Sydney) | Sun, Quan-Sen (Nanjing University of Science and Technology) | Ong, Yew-Soon (Nanyang Technological University)

AAAI ConferencesFeb-8-2018

Embedding methods have shown promising performance in multi-label prediction, as they can discover the dependency of labels. Most embedding methods cannot well align the input and output, which leads to degradation in prediction performance. Besides, they suffer from expensive prediction computational costs when applied to large-scale datasets. To address the above issues, this paper proposes a Co-Hashing (CoH) method by formulating multi-label learning from the perspective of cross-view learning. CoH first regards the input and output as two views, and then aims to learn a common latent hamming space, where input and output pairs are compressed into compact binary embeddings. CoH enjoys two key benefits: 1) the input and output can be well aligned, and their correlations are explored; 2) the prediction is very efficient using fast cross-view kNN search in the hamming space. Moreover, we provide the generalization error bound for our method. Extensive experiments on eight real-world datasets demonstrate the superiority of the proposed CoH over the state-of-the-art methods in terms of both prediction accuracy and efficiency.

artificial intelligence, dataset, optimization problem, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Sparse Embedded $k$-Means Clustering

Liu, Weiwei, Shen, Xiaobo, Tsang, Ivor

Neural Information Processing SystemsDec-31-2017

The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min \{n,d\}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.

artificial intelligence, data mining, matrix, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.29)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Compressed K-Means for Large-Scale Clustering

Shen, Xiaobo (Nanjing University of Science and Technology) | Liu, Weiwei (University of Technology Sydney) | Tsang, Ivor (University of Technology Sydney) | Shen, Fumin (University of Electronic Science and Technology of China) | Sun, Quan-Sen (Nanjing University of Science and Technology)

AAAI ConferencesFeb-14-2017

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

artificial intelligence, dataset, machine learning, (18 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia (0.14)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback