AITopics

Clustering has been one of the most critical unsupervised learning techniques that has been widely applied in data mining problems. As one of its branches, graph clustering enjoys its popularity due to its appealing performance and strong theoretical supports. However, the eigen-decomposition problems involved are computationally expensive. In this paper, we propose a deep structure with a linear coder as the building block for fast graph clustering, called Deep Linear Coding (DLC). Different from conventional coding schemes, we jointly learn the feature transform function and discriminative codings, and guarantee that the learned codes are robust in spite of local distortions. In addition, we use the proposed linear coders as the building blocks to formulate a deep structure to further refine features in a layerwise fashion. Extensive experiments on clustering tasks demonstrate that our method performs well in terms of both time complexity and clustering accuracy. On a large-scale benchmark dataset (580K), our method runs 1500 times faster than the original spectral clustering.

graph, representation, spectral, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Scalable Probabilistic Tensor Factorization for Binary and Count Data

Rai, Piyush (Duke University) | Hu, Changwei (Duke University) | Harding, Matthew (Duke University) | Carin, Lawrence (Duke University)

Tensor factorization methods provide a useful way to extract latent factors from complex multirelational data, and also for predicting missing data. Developing tensor factorization methods for massive tensors, especially when the data are binary- or count-valued (which is true of most real-world tensors), however, remains a challenge. We develop a scalable probabilistic tensor factorization framework that enables us to perform efficient factorization of massive binary and count tensor data. The framework is based on (i) the Polya-Gamma augmentation strategy which makes the model fully locally conjugate and allows closed-form parameter updates when data are binary- or count-valued; and (ii) an efficient online Expectation Maximization algorithm, which allows processing data in small mini-batches, and facilitates handling massive tensor data. Moreover, various types of constraints on the factor matrices (e.g., sparsity, non-negativity) can be incorporated under the proposed framework, providing good interpretability, which can be useful for qualitative analyses of the results. We apply the proposed framework on analyzing several binary- and count-valued real-world data sets.

algorithm, tensor, tensor data, (17 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Africa > Senegal > Kolda Region > Kolda (0.06)
North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

EigenGP: Gaussian Process Models with Adaptive Eigenfunctions

Peng, Hao (Purdue University) | Qi, Yuan (Purdue University)

Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost for big data. In this paper, we propose a new Bayesian approach, EigenGP, that learns both basis dictionary elements — eigenfunctions of a GP prior — and prior precisions in a sparse finite model. It is well known that, among all orthogonal basis functions, eigenfunctions can provide the most compact representation. Unlike other sparse Bayesian finite models where the basis function has a fixed form, our eigenfunctions live in a reproducing kernel Hilbert space as a finite linear combination of kernel functions. We learn the dictionary elements — eigenfunctions — and the prior precisions over these elements as well as all the other hyperparameters from data by maximizing the model marginal likelihood. We explore computational linear algebra to simplify the gradient computation significantly. Our experimental results demonstrate improved predictive performance of EigenGP over alternative sparse GP methods as well as relevance vector machines.

basis function, eigenfunction, eigengp, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Image Feature Learning for Cold Start Problem in Display Advertising

Mo, Kaixiang (Hong Kong University of Science and Technology) | Liu, Bo (Hong Kong University of Science and Technology) | Xiao, Lei (Tencent Inc., Shenzhen) | Li, Yong (Tencent Inc., Shenzhen) | Jiang, Jie (Tencent Inc., Shenzhen)

In online display advertising, state-of-the-art Click Through Rate(CTR) prediction algorithms rely heavily on historical information, and they work poorly on growing number of new ads without any historical information. This is known as the the cold start problem. For image ads, current state-of-the-art systems use handcrafted image features such as multimedia features and SIFT features to capture the attractiveness of ads. However, these handcrafted features are task dependent, inflexible and heuristic. In order to tackle the cold start problem in image display ads, we propose a new feature learning architecture to learn the most discriminative image features directly from raw pixels and user feedback in the target task. The proposed method is flexible and does not depend on human heuristic. Extensive experiments on a real world dataset with 47 billion records show that our feature learning method outperforms existing handcrafted features significantly, and it can extract discriminative and meaningful features.

category, dataset, image feature, (13 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Industry:

Marketing (1.00)
Information Technology > Services (0.86)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Optimizing Locally Linear Classifiers with Supervised Anchor Point Learning

Mao, Xue (Chinese Academy of Sciences) | Fu, Zhouyu (University of Western Sydney) | Wu, Ou (Chinese Academy of Sciences) | Hu, Weiming (Chinese Academy of Sciences)

Kernel SVM suffers from high computational complexity when dealing with large-scale nonlinear datasets. To address this issue, locally linear classifiers have been proposed for approximating nonlinear decision boundaries with locally linear functions using a local coding scheme. The effectiveness of such coding scheme depends heavily on the quality of anchor points chosen to produce the local codes. Existing methods usually involve a phase of unsupervised anchor point learning followed by supervised classifier learning. Thus, the anchor points and classifiers are obtained separately whereas the learned anchor points may not be optimal for the discriminative task. In this paper, we present a novel fully supervised approach for anchor point learning. A single optimization problem is formulated over both anchor point and classifier variables, optimizing the initial anchor points jointly with the classifiers to minimize the classification risk. Experimental results show that our method outperforms other competitive methods which employ unsupervised anchor point learning and achieves performance on par with the kernel SVM albeit with much improved efficiency.

anchor point, classifier, dataset, (17 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.72)

Between Imitation and Intention Learning

MacGlashan, James (Brown University) | Littman, Michael L. (Brown University)

Research in learning from demonstration can generally be grouped into either imitation learning or intention learning. In imitation learning, the goal is to imitate the observed behavior of an expert and is typically achieved using supervised learning techniques. In intention learning, the goal is to learn the intention that motivated the expert's behavior and to use a planning algorithm to derive behavior. Imitation learning has the advantage of learning a direct mapping from states to actions, which bears a small computational cost. Intention learning has the advantage of behaving well in novel states, but may bear a large computational cost by relying on planning algorithms in complex tasks. In this work, we introduce receding horizon inverse reinforcement learning, in which the planning horizon induces a continuum between these two learning paradigms. We present empirical results on multiple domains that demonstrate that performing IRL with a small, but non-zero, receding planning horizon greatly decreases the computational cost of planning while maintaining superior generalization performance compared to imitation learning.

agent, reward function, value function, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Multi-Task Multi-Dimensional Hawkes Processes for Modeling Event Sequences

Luo, Dixin (Shanghai Jiao Tong University) | Xu, Hongteng (Georgia Institute of Technology) | Zhen, Yi (Georgia Institute of Technology) | Ning, Xia (Indiana University-Purdue University Indianapolis) | Zha, Hongyuan (Georgia Institute of Technology) | Yang, Xiaokang (Shanghai Jiao Tong University) | Zhang, Wenjun (Shanghai Jiao Tong University)

We propose a Multi-task Multi-dimensional Hawkes Process (MMHP) for modeling event sequences where there exist multiple triggering patterns within sequences and structures across sequences.MMHP is able to model the dynamics of multiple sequences jointly by imposing structural constraints and thus systematically uncover clustering structure among sequences.We propose an effective and robust optimization algorithm to learn MMHP models, which takes advantage of alternating direction method of multipliers (ADMM), majorization minimization and Euler-Lagrange equations.Our experimental results demonstrate that MMHP performs well on both synthetic and real data

event sequence, mmhp, sequence, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Robust Kernel Dictionary Learning Using a Whole Sequence Convergent Algorithm

Liu, Huaping (Tsinghua University) | Qin, Jie (Tsinghua University) | Cheng, Hong (University of Electronic Science and Technology of China) | Sun, Fuchun (Tsinghua University)

Kernel sparse coding is an effective strategy to capturethe non-linear structure of data samples. However,how to learn a robust kernel dictionary remainsan open problem. In this paper, we propose a new optimization model to learn the robust kernel dictionary while isolating outliers in the training samples. This model is essentially based on the decomposition of the reconstruction error into small dense noises and large sparse outliers. The outliererror term is formulated as the product of the sample matrix in the feature space and a diagonal coefficient matrix. This facilitates the kernelized dictionary learning. To solve the non-convex optimization problem, we develop a whole sequence convergent algorithm which guarantees the obtained solution sequence is a Cauchy sequence. The experimental results show that the proposed robust kernel dictionary learning method provides significant performance improvement.

dictionary learning, feature space, learning, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Europe > Sweden > Uppsala County > Uppsala (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)

Regularizing Flat Latent Variables with Hierarchical Structures

Lin, Rongcheng (University of North Carolina at Charlotte) | Li, Huayu (University of North Carolina at Charlotte) | Quan, Xiaojun (Institute for Infocomm Research) | Hong, Richang (Hefei University of Technology) | Wu, Zhiang (Nanjing University of Finance and Economics) | Ge, Yong (University of North Carolina at Charlotte)

In this paper, we propose a stratified topic model (STM). Instead of directly modeling and inferring flat topics or hierarchically structured topics, we use the stratified relationships in topic hierarchies to regularize the flat topics. The topic structures are captured by a hierarchical clustering method and play as constraints during the learning process. We propose two theoretically sound and practical inference methods to solve the model. Experimental results with two real world data sets and various evaluation metrics demonstrate the effectiveness of the proposed model.

proceedings, stm, topic model, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
South America > Paraguay > Asunción > Asunción (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Density Corrected Sparse Recovery when R.I.P. Condition Is Broken

Lin, Ming (Carnegie Mellon University) | Lan, Zhengzhong (Carnegie Mellon University) | Hauptmann, Alexander G. (Carnegie Mellon University)

Traditional methods which the features form cluster structures, as can be seen in often rely on R.I.P or its relaxed variants. However, many machine learning [Lehiste, 1976] and computer vision in real applications, features are often correlated problems [Lan et al., 2013; Lowe, 2004]. Due to the fact that to each other, which makes these assumptions many features extractors are similar to each others and they too strong to be useful. In this paper, we reflect the characteristics of the same image, vision features study the sparse recovery problem in which the feature are often correlated and have cluster structures. This correlation matrix is strictly non-R.I.P.. We prove that is even stronger in those systems that have thousands when features exhibit cluster structures, which often to millions of features [Lan et al., 2013; Gan et al., 2015a; happens in real applications, we are able to recover 2015b].

algorithm, density correction, matrix, (12 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)