AITopics

Recently significant advances have been witnessed in the area of distributed word representations based on neural networks, which are also known as word embeddings. Among the new word embedding models, skip-gram negative sampling (SGNS) in the word2vec toolbox has attracted much attention due to its simplicity and effectiveness. However, the principles of SGNS remain not well understood, except for a recent work that explains SGNS as an implicit matrix factorization of the pointwise mutual information (PMI) matrix. In this paper, we provide a new perspective for further understanding SGNS. We point out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word. Based on the representation learning view, SGNS is in fact an explicit matrix factorization (EMF) of the words’ co-occurrence matrix. Furthermore, extended supervised word embedding can be established based on our proposed representation learning view.

matrix factorization, representation, sgn, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Li, Ya (University of Science and Technology of China) | Tian, Xinmei (University of Science and Technology of China) | Liu, Tongliang (University of Technology, Sydney) | Tao, Dacheng (University of Technology, Sydney)

Multi-Task Model and Feature Joint Learning

Given several tasks, multi-task learning (MTL) learns multiple tasks jointly by exploring the interdependence between them. The basic assumption in MTL is that those tasks are indeed related. Existing MTL methods model the task relatedness/interdependence in two different ways, either common parameter-sharing or common feature-sharing across tasks. In this paper, we propose a novel multi-task learning method to jointly learn shared parameters and shared feature representation. Our objective is to learn a set of common features with which the tasks are related as closely as possible, therefore common parameters shared across tasks can be optimally learned. We present a detailed deviation of our multi-task learning method and propose an alternating algorithm to solve the non-convex optimization problem. We further present a theoretical bound which directly demonstrates that the proposed multi-task learning method can successfully model the relatedness via joint common parameter- and common feature-learning. Extensive experiments are conducted on several real world multi-task learning datasets. All results demonstrate the effectiveness of our multi-task model and feature joint learning method.

dataset, learning, learning method, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China (0.05)
Oceania > Australia (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Multi-Label Classification with Feature-Aware Non-Linear Label Space Transformation

Li, Xin (Temple University) | Guo, Yuhong (Temple University)

Multi-label classification with many classes has recently drawn a lot of attention. Existing methods address this problem by performing linear label space transformation to reduce the dimension of label space, and then conducting independent regression for each reduced label dimension. These methods however do not capture nonlinear correlations of the multiple labels and may lead to significant information loss in the process of label space reduction. In this paper, we first propose to exploit kernel canonical correlation analysis (KCCA) to capture nonlinear label correlation information and perform nonlinear label space reduction. Then we develop a novel label space reduction method that explicitly combines linear and nonlinear label space transformations based on CCA and KCCA respectively to address multi-label classification with many classes. The proposed method is a feature-aware label transformation method that promotes the label predictability in the transformed label space from the input features. We conduct experiments on a number of multi-label classification datasets. The proposed approach demonstrates good performance, comparing to a number of state-of-the-art label dimension reduction methods.

classification, dataset, label space, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Middle East > Lebanon (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Li, Xiang (University of Western Ontario and National University of Defense Technology) | Wang, Huaimin (National University of Defense Technology) | Gu, Bin (Nanjing University of Information Science Technology and University of Western Ontario) | Ling, Charles X. (University of Western Ontario)

Data Sparseness in Linear SVM

Large sparse datasets are common in many real-world applications. Linear SVM has been shown to be very efficient for classifying such datasets. However, it is still unknown how data sparseness would affect its convergence behavior. To study this problem in a systematic manner, we propose a novel approach to generate large and sparse data from real-world datasets, using statistical inference and the data sampling process in the PAC framework. We first study the convergence behavior of linear SVM experimentally, and make several observations, useful for real-world applications. We then offer theoretical proofs for our observations by studying the Bayes risk and PAC bound. Our experiment and theoretic results are valuable for learning large sparse datasets with linear SVM.

data sparseness, dataset, sparseness, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > Canada > Ontario (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Bayesian Active Learning for Posterior Estimation

Kandasamy, Kirthevasan (Carnegie Mellon University) | Schneider, Jeff (Carnegie Mellon University) | Poczos, Barnabas (Carnegie Mellon University)

This paper studies active posterior estimation in a Bayesian setting when the likelihood is expensive to evaluate. Existing techniques for posterior estimation are based on generating samples representative of the posterior. Such methods do not consider efficiency in terms of likelihood evaluations. In order to be query efficient we treat posterior estimation in an active regression framework. We propose two myopic query strategies to choose where to evaluate the likelihood and implement them using Gaussian processes. Via experiments on a series of synthetic and real examples we demonstrate that our approach is significantly more query efficient than existing techniques and other heuristics for posterior estimation.

joint probability, likelihood, posterior, (13 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre:

Research Report (0.48)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Fast Cross-Validation for Incremental Learning

Joulani, Pooria (University of Alberta) | Gyorgy, Andras (University of Alberta) | Szepesvari, Csaba (University of Alberta)

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.

algorithm, dataset, ree cv, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

A New Simplex Sparse Learning Model to Measure Data Similarity for Clustering

Huang, Jin (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)

The Laplacian matrix of a graph can be used in many areas of mathematical research and has a physical interpretation in various theories. However, there are a few open issues in the Laplacian graph construction: (i) Selecting the appropriate scale of analysis, (ii) Selecting the appropriate number of neighbors, (iii) Handling multiscale data, and, (iv) Dealing with noise and outliers. In this paper, we propose that the affinity between pairs of samples could be computed using sparse representation with proper constraints. This parameter free setting automatically produces the Laplacian graph, leads to significant reduction in computation cost and robustness to the outliers and noise. We further provide an efficient algorithm to solve the difficult optimization problem based on improvement of existing algorithms. To demonstrate our motivation, we conduct spectral clustering experiments with benchmark methods. Empirical experiments on 9 data sets demonstrate the effectiveness of our method.

algorithm, spectral, vector, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

South America > Paraguay > Asunción > Asunción (0.05)
North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > Ohio (0.04)
(2 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Guo, Xiaojie (Chinese Academy of Sciecnces)

Robust Subspace Segmentation by Simultaneously Learning Data Representations and Their Affinity Matrix

The goal of subspace segmentation is to partition a set of data drawn from a union of subspace into their underlying subspaces. The performance of spectral clustering based approaches heavily depends on learned data affinity matrices, which are usually constructed either directly from the raw data or from their computed representations. In this paper, we propose a novel method to simultaneously learn the representations of data and the affinity matrix of representation in a unified optimization framework. A novel Augmented Lagrangian Multiplier based algorithm is designed to effectively and efficiently seek the optimal solution of the problem. The experimental results on both synthetic and real data demonstrate the efficacy of the proposed method and its superior performance over the state-of-the-art alternatives.

affinity matrix, experiment, representation, (13 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Guo, Xiaojie (Chinese Academy of Sciences)

Online Robust Low Rank Matrix Recovery

Low rank matrix recovery has shown its importance as a theoretic foundation in many areas of information processing. Its solutions are usually obtained in batch mode that requires to load all the data into memory during processing, and thus are hardly applicable on large scale data. Moreover, a fraction of data may be severely contaminated by outliers, which makes accurate recovery significantly more challenging. This paper proposes a novel online robust low rank matrix recovery method to address these difficulties. In particular, we first introduce an online algorithm to solve the problem of low rank matrix completion. Then we move on to low rank matrix recovery from observations with intensive outliers. The outlier support is robustly estimated from a perspective of mixture model. Experiments on both synthetic and real data are conducted to demonstrate the efficacy of our method and show its superior performance over the state-of-the-arts.

orlrmr, outlier, proceedings, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Asia > China (0.04)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Bi-Parameter Space Partition for Cost-Sensitive SVM

Gu, Bin (Nanjing University of Information Science and Technology) | Sheng, Victor S. (University of Central Arkansas) | Li, Shuo (GE HealthCare)

Model selection is an important problem of cost-sensitive SVM (CS-SVM). Although using solution path to find global optimal parameters is a powerful method for model selection, it is a challenge to extend the framework to solve two regularization parameters of CS-SVM simultaneously. To overcome this challenge, we make three main steps in this paper. (i) A critical-regions-based bi-parameter space partition algorithm is proposed to present all piecewise linearities of CS-SVM. (ii) An invariant-regions-based bi-parameter space partition algorithm is further proposed to compute empirical errors for all parameter pairs. (iii) The global optimal solutions for K-fold cross validation are computed by superposing K invariant region based bi-parameter space partitions into one. The three steps constitute the model selection of CS-SVM which can find global optimal parameter pairs in K-fold cross validation. Experimental results on seven normal datsets and four imbalanced datasets, show that our proposed method has better generalization ability and than various kinds of grid search methods, however, with less running time.

convex polygon region, parameter pair, polygon region, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Arkansas (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(3 more...)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)