AITopics | Tsang, Ivor

Collaborating Authors

Tsang, Ivor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Masking: A New Perspective of Noisy Supervision

Han, Bo, Yao, Jiangchao, Niu, Gang, Zhou, Mingyuan, Tsang, Ivor, Zhang, Ya, Sugiyama, Masashi

arXiv.org Machine LearningMay-21-2018

It is important to learn classifiers under noisy labels due to their ubiquities. As noisy labels are corrupted from ground-truth labels by an unknown noise transition matrix, the accuracy of classifiers can be improved by estimating this matrix, without introducing either sample-selection or regularization biases. However, such estimation is often inexact, which inevitably degenerates the accuracy of classifiers. The inexact estimation is due to either a heuristic trick, or the brutal-force learning by deep networks under a finite dataset. In this paper, we present a human-assisted approach called "\textit{masking}". The masking conveys human cognition of invalid class transitions, and naturally speculates the structure of the noise transition matrix. Given the structure information, we only learn the noise transition probability to reduce the estimation burden. To instantiate this approach, we derive a structure-aware probabilistic model, which incorporates a structure prior. During the model realization, we solve the challenges from structure extraction and alignment in principle. Empirical results on benchmark datasets with three noise structures show that, our approach can improve the robustness of classifiers significantly.

artificial intelligence, neural network, noise transition matrix, (15 more...)

arXiv.org Machine Learning

1805.08193

Country:

Asia (0.46)
North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Variational Composite Autoencoders

Yao, Jiangchao, Tsang, Ivor, Zhang, Ya

arXiv.org Machine LearningApr-12-2018

Learning in the latent variable model is challenging in the presence of the complex data structure or the intractable latent variable. Previous variational autoencoders can be low effective due to the straightforward encoder-decoder structure. In this paper, we propose a variational composite autoencoder to sidestep this issue by amortizing on top of the hierarchical latent variable model. The experimental results confirm the advantages of our model.

artificial intelligence, autoencoder, neural network, (15 more...)

arXiv.org Machine Learning

1804.04435

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sparse Embedded $k$-Means Clustering

Liu, Weiwei, Shen, Xiaobo, Tsang, Ivor

Neural Information Processing SystemsDec-31-2017

The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min \{n,d\}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.

artificial intelligence, data mining, matrix, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.29)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Compressed K-Means for Large-Scale Clustering

Shen, Xiaobo (Nanjing University of Science and Technology) | Liu, Weiwei (University of Technology Sydney) | Tsang, Ivor (University of Technology Sydney) | Shen, Fumin (University of Electronic Science and Technology of China) | Sun, Quan-Sen (Nanjing University of Science and Technology)

AAAI ConferencesFeb-14-2017

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

artificial intelligence, dataset, machine learning, (18 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia (0.14)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Approximate Conditional Gradient Descent on Multi-Class Classification

Liu, Zhuanghua (University of Technology Sydney) | Tsang, Ivor (University of Technology Sydney)

AAAI ConferencesFeb-14-2017

Conditional gradient descent, aka the Frank-Wolfe algorithm,regains popularity in recent years. The key advantage of Frank-Wolfe is that at each step the expensive projection is replaced with a much more efficient linear optimization step. Similar to gradient descent, the loss function of Frank-Wolfe scales with the data size. Training on big data poses a challenge for researchers. Recently, stochastic Frank-Wolfe methods have been proposed to solve the problem, but they do not perform well in practice. In this work, we study the problem of approximating the Frank-Wolfe algorithm on the large-scale multi-class classification problem which is a typical application of the Frank-Wolfe algorithm. We present a simple but effective method employing internal structure of data to approximate Frank-Wolfe on the large-scale multiclass classification problem. Empirical results verify that our method outperforms the state-of-the-art stochastic projection free methods.

algorithm, artificial intelligence, optimization problem, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.16)
North America > Canada > Quebec (0.14)
North America > United States > Texas (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback

On the Optimality of Classifier Chain for Multi-label Classification

Liu, Weiwei, Tsang, Ivor

Neural Information Processing SystemsDec-31-2015

To capture the interdependencies between labels in multi-label classification problems, classifier chain (CC) tries to take the multiple labels of each instance into account under a deterministic high-order Markov Chain model. Since its performance is sensitive to the choice of label order, the key issue is how to determine the optimal label order for CC. In this work, we first generalize the CC model over a random label order. Then, we present a theoretical analysis of the generalization error for the proposed generalized model. Based on our results, we propose a dynamic programming based classifier chain (CC-DP) algorithm to search the globally optimal label order for CC and a greedy classifier chain (CC-Greedy) algorithm to find a locally optimal CC. Comprehensive experiments on a number of real-world multi-label data sets from various domains demonstrate that our proposed CC-DP algorithm outperforms state-of-the-art approaches and the CC-Greedy algorithm achieves comparable prediction performance with CC-DP.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
North America > Canada > Ontario (0.14)
Europe > Spain > Catalonia (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

A Split-Merge Framework for Comparing Clusterings

Xiang, Qiaoliang, Mao, Qi, Chai, Kian Ming, Chieu, Hai Leong, Tsang, Ivor, Zhao, Zhendong

arXiv.org Machine LearningSep-4-2012

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.

artificial intelligence, machine learning, subcomponent, (15 more...)

arXiv.org Machine Learning

1206.6475

Country:

Europe > United Kingdom > Scotland (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Discovering Support and Affiliated Features from Very High Dimensions

Zhai, Yiteng, Tan, Mingkui, Tsang, Ivor, Ong, Yew Soon

arXiv.org Machine LearningJun-27-2012

In this paper, a novel learning paradigm is presented to automatically identify groups of informative and correlated features from very high dimensions. Specifically, we explicitly incorporate correlation measures as constraints and then propose an efficient embedded feature selection method using recently developed cutting plane strategy. The benefits of the proposed algorithm are two-folds. First, it can identify the optimal discriminative and uncorrelated feature subset to the output labels, denoted here as Support Features, which brings about significant improvements in prediction performance over other state of the art feature selection methods considered in the paper. Second, during the learning process, the underlying group structures of correlated features associated with each support feature, denoted as Affiliated Features, can also be discovered without any additional cost. These affiliated features serve to improve the interpretations on the learning tasks. Extensive empirical studies on both synthetic and very high dimensional real-world datasets verify the validity and efficiency of the proposed method.

artificial intelligence, machine learning, support feature, (15 more...)

arXiv.org Machine Learning

1206.6477

Country:

North America > United States (0.15)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback