AITopics

1309.6847

Country:

North America > United States (0.28)
Asia > Middle East (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

arXiv.org Machine LearningAug-17-2013

Reference Distance Estimator

Li, Yanpeng

Abstract: A theoretical study is presented for a simple linear classifier called reference distance estimator (RDE), which assigns the weight of each feature j as P(r j)-P(r), where r is a reference feature relevant to the target class y. The analysis shows that if r performs better than random guess in predicting y and is conditionally independent with each feature j, the RDE will have the same classification performance as that from P(y j)-P(y), a classifier trained with the gold standard y. Since the estimation of P(r j)-P(r) does not require labeled data, under the assumption above, RDE trained with a large number of unlabeled examples would be close to that trained with infinite labeled examples. For the case the assumption does not hold, we theoretically analyze the factors that influence the closeness of the RDE to the perfect one under the assumption, and present an algorithm to select reference features and combine multiple RDEs from different reference features using both labeled and unlabeled data. The experimental results on 10 text classification tasks show that the semi-supervised learning method improves supervised methods using 5,000 labeled examples and 13 million unlabeled ones, and in many tasks, its performance is even close to a classifier trained with 13 million labeled examples. In addition, the bounds in the theorems provide good estimation of the classification performance and can be useful for new algorithm design.

artificial intelligence, machine learning, reference feature, (19 more...)

1308.3818

Country: Asia > China (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Semi-Supervised Learning with Manifold Fitted Graphs

Zhang, Tongtao (Columbia University) | Ji, Rongrong (Xiamen University) | Liu, Wei (IBM Research) | Tao, Dacheng (University of Technology, Sydney) | Hua, Gang (Stevens Institute of Technology)

In this paper, we propose a locality-constrained and sparsity-encouraged manifold fitting approach, aiming at capturing the locally sparse manifold structure into neighborhood graph construction by exploiting a principled optimization model. The proposed model formulates neighborhood graph construction as a sparse coding problem with the locality constraint, therefore achieving simultaneous neighbor selection and edge weight optimization. The core idea underlying our model is to perform a sparse manifold fitting task for each data point so that close-by points lying on the same local manifold are automatically chosen to connect and meanwhile the connection weights are acquired by simple geometric reconstruction. We term the novel neighborhood graph generated by our proposed optimization model M-Fitted Graph since such a graph stems from sparse manifold fitting. To evaluate the robustness and effectiveness ofM -fitted graphs, we leverage graph-based semisupervised learning as the testbed. Extensive experiments carried out on six benchmark datasets validate that the proposed M -fitted graph is superior to state-of-the-art neighborhood graphs in terms of classification accuracy using popular graph-based semi-supervised learning methods.

manifold fitted graph, semi-supervised learning

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.60)

Multi-Instance Multi-Label Learning with Weak Label

Yang, Shu-Jun (Nanjing University) | Jiang, Yuan (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)

Multi-Instance Multi-Label learning (MIML) deals with data objects that are represented by a bag of instances and associated with a set of class labels simultaneously. Previous studies typically assume that for every training example, all positive labels are tagged whereas the untagged labels are all negative. In many real applications such as image annotation, however, the learning problem often suffers from weak label; that is, users usually tag only a part of positive labels, and the untagged labels are not necessarily negative. In this paper, we propose the MIMLwel approach which works by assuming that highly relevant labels share some common instances, and the underlying class means of bags for each label are with a large margin. Experiments validate the effectiveness of MIMLwel in handling the weak label problem.

multi-instance multi-label learning, weak label

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Deep Feature Learning Using Target Priors with Applications in ECoG Signal Decoding for BCI

Wang, Zuoguan (Rensselaer Polytechnic Institute) | Lyu, Siwei (University at Albany, SUNY) | Schalk, Gerwin (Wadsworth Center) | Ji, Qiang (Rensselaer Polytechnic Institute)

Recent years have seen a great interest in using deep architectures for feature learning from data. One drawback of the commonly used unsupervised deep feature learning methods is that for supervised or semi-supervised learning tasks, the information in the target variables are not used until the final stage when the classifier or regressor is trained on the learned features. This could lead to over-generalized features that are not competitive on the specific supervised or semi-supervised learning tasks. In this work, we describe a new learning method that combines deep feature learning on mixed labeled and unlabeled data sets. Specifically, we describe a weakly supervised learning method of a prior supervised convolutional stacked auto-encoders (PCSA), of which information in the target variables is represented probabilistically using a Gaussian Bernoulli restricted Boltzmann machine (RBM). We apply this method to the decoding problem of an ECoG based Brain Computer Interface (BCI) system. Our experimental results show that PCSA achieves significant improvement in decoding performance on benchmark data sets compared to the unsupervised feature learning as well as to the current state-of-the-art algorithms that are based on manually crafted features.

application, deep feature learning, ecog signal decoding, (1 more...)

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Probabilistic Multi-Label Classiﬁcation with Sparse Feature Learning

Guo, Yuhong (Temple University) | Xue, Wei (Temple University)

Multi-label classiﬁcation is a critical problem in many areas of data analysis such as image labeling and text categorization. In this paper we propose a probabilistic multi-label classiﬁcation model based on novel sparse feature learning. By employing an individual sparsity inducing ℓ1-norm and a group sparsity inducing ℓ2,1-norm, the proposed model has the capacity of capturing both label interdependencies and common predictive model structures. We formulate this sparse norm regularized learning problem as a non-smooth convex optimization problem, and develop a fast proximal gradient algorithm to solve it for an optimal solution. Our empirical study demonstrates the efﬁcacy of the proposed method on a set of multi-label tasks given a limited number of labeled training instances.

cation, sparse feature learning

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

arXiv.org Machine LearningJul-24-2013

Does generalization performance of $l^q$ regularization learning depend on $q$? A negative example

Lin, Shaobo, Xu, Chen, Zeng, Jingshan, Fang, Jian

$l^q$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a $l^q$ estimator differs in varying choices of the regularization order $q$. In particular, $l^1$ leads to the LASSO estimate, while $l^{2}$ corresponds to the smooth ridge regression. This makes the order $q$ a potential tuning parameter in applications. To facilitate the use of $l^{q}$-regularization, we intend to seek for a modeling strategy where an elaborative selection on $q$ is avoidable. In this spirit, we place our investigation within a general framework of $l^{q}$-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all $l^{q}$ estimators for $0< q < \infty$ attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..

artificial intelligence, generalization capability, machine learning, (17 more...)

1307.6616

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Machine LearningJul-11-2013

Application of three graph Laplacian based semi-supervised learning methods to protein function prediction problem

Tran, Loc

Protein function prediction is the important problem in modern biology. In this paper, the un-normalized, symmetric normalized, and random walk graph Laplacian based semi-supervised learning methods will be applied to the integrated network combined from multiple networks to predict the functions of all yeast proteins in these multiple networks. These multiple networks are network created from Pfam domain structure, co-participation in a protein complex, protein-protein interaction network, genetic interaction network, and network created from cell cycle gene expression measurements. Multiple networks are combined with fixed weights instead of using convex optimization to determine the combination weights due to high time complexity of convex optimization method. This simple combination method will not affect the accuracy performance measures of the three semi-supervised learning methods. Experiment results show that the un-normalized and symmetric normalized graph Laplacian based methods perform slightly better than random walk graph Laplacian based method for integrated network. Moreover, the accuracy performance measures of these three semi-supervised learning methods for integrated network are much better than the best accuracy performance measures of these three methods for the individual network.

artificial intelligence, graph laplacian, machine learning, (14 more...)

doi: 10.5121/ijbb.2013.3202

1211.4289

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Agarwal, Alekh, Chapelle, Olivier, Dudik, Miroslav, Langford, John

A Reliable Effective Terascale Linear Learning System

arXiv.org Machine LearningJul-11-2013

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature (as of 2011 when our experiments were conducted). We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.

artificial intelligence, inductive learning, machine learning, (17 more...)

1110.4198

Country: North America > United States > California (0.28)

Genre: Research Report (0.83)

Industry: Marketing (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.66)

Raghavan, Sindhu (The University of Texas at Austin) | Mooney, Raymond J. (The University of Texas at Austin)

Online Inference-Rule Learning from Natural-Language Extractions

AAAI ConferencesJul-9-2013

In this paper, we consider the problem of learning commonsenseknowledge in the form of first-order rules from incomplete and noisynatural-language extractions produced by an off-the-shelf informationextraction (IE) system. Much of the information conveyed in text mustbe inferred from what is explicitly stated since easily inferablefacts are rarely mentioned. The proposed rule learner accounts forthis phenomenon by learning rules in which the body of the rulecontains relations that are usually explicitly stated, while the heademploys a less-frequently mentioned relation that is easilyinferred. The rule learner processes training examples in an onlinemanner to allow it to scale to large text corpora. Furthermore, wepropose a novel approach to weighting rules using a curated lexicalontology like WordNet. The learned rules along with their parametersare then used to infer implicit information using a Bayesian LogicProgram. Experimental evaluation on a machine reading testbeddemonstrates the efficacy of the proposed methods.