AITopics | Statistical Learning

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

dataset, mimlmix, partial example, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > Vietnam > Hanoi > Hanoi (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre:

Research Report (0.34)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)

AAAI ConferencesJul-14-2014

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

dataset, mimlmix, partial example, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > Vietnam > Hanoi > Hanoi (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre:

Research Report (0.34)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)

AAAI ConferencesJul-14-2014

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

dataset, mimlmix, partial example, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > Vietnam > Hanoi > Hanoi (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre:

Research Report (0.34)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)

AAAI ConferencesJul-14-2014

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

dataset, mimlmix, partial example, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > Vietnam > Hanoi > Hanoi (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre:

Research Report (0.34)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Exploiting Competition Relationship for Robust Visual Recognition

Du, Liang (Temple University) | Ling, Haibin (Temple University)

AAAI ConferencesJul-14-2014

Joint learning of similar tasks has been a popular trend in visual recognition and proven to be beneficial. Between-task similarity often provides useful cues, such as feature sharing, for learning visual classifiers. By contrast, the competition relationship between visual recognition tasks (e.g., content independent writer identification and handwriting recognition) remains largely under-explored. A key challenge in visual recognition is to select the most discriminating features and remove irrelevant features related to intra-class variations. With the help of auxiliary competing tasks, we can identify such features within a joint learning model exploiting the competition relationship.Motivated by this intuition, we propose a novel way to exploit competition relationship for solving visual recognition problems. Specifically, given a target task and its competing tasks, we jointly model them by a generalized additive regression model with a competition constraint. This constraint effectively discourages choosing of irrelevant features (weak learners) that support the auxiliary competing tasks. We name the proposed algorithm CompBoost. In our study, CompBoost is applied to two visual recognition applications: (1) content-independent writer identification from handwriting scripts by exploiting competing tasks of handwriting recognition, and (2) actor-independent facial expression recognition by exploiting competing tasks of face recognition. In both experiments our approach demonstrates promising performance gains by exploiting the between-task competition.

algorithm, recognition, target task, (11 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

South America > Chile > Arica y Parinacota Region > Arica Province > Arica (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MaxSAT by Improved Instance-Specific Algorithm Configuration

Ansotegui, Carlos (University of Lleida) | Malitsky, Yuri (Insight Centre for Data Analytics) | Sellmann, Meinolf (IBM Watson Research Center)

AAAI ConferencesJul-14-2014

Our objective is to boost the state-of-the-art performance in MaxSATsolving. To this end, we employ the instance-specific algorithmconfigurator ISAC, and improve it with the latest inportfolio technology. Experimental results on SAT show that thiscombination marks a significant step forward in our ability to tunealgorithms instance-specifically. We then apply the new methodology toa number of MaxSAT problem domains and show that the resulting solversconsistently outperform the best existing solvers on the respectiveproblem families. In fact, the solvers presented here were independentlyevaluated at the 2013 MaxSAT Evaluation where they won six of the elevencategories.

maxsat, portfolio, solver, (14 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.04)
Europe > Spain > Catalonia > Lleida Province > Lleida (0.04)
Europe > Ireland > Munster > County Cork > Cork (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Novel Density-Based Clustering Algorithms for Uncertain Data

Zhang, Xianchao (Dalian University of Technology) | Liu, Han (Dalian University of Technology) | Zhang, Xiaotong (Dalian University of Technology) | Liu, Xinyue (Dalian University of Technology)

AAAI ConferencesJul-14-2014

Density-based techniques seem promising for handling datauncertainty in uncertain data clustering. Nevertheless, someissues have not been addressed well in existing algorithms. Inthis paper, we firstly propose a novel density-based uncertaindata clustering algorithm, which improves upon existing algorithmsfrom the following two aspects: (1) it employs anexact method to compute the probability that the distance betweentwo uncertain objects is less than or equal to a boundaryvalue, instead of the sampling-based method in previouswork; (2) it introduces new definitions of core object probabilityand direct reachability probability, thus reducing thecomplexity and avoiding sampling. We then further improvethe algorithm by using a novel assignment strategy to ensurethat every object will be assigned to the most appropriatecluster. Experimental results show the superiority of our proposedalgorithms over existing ones.

algorithm, minp ts, probability, (15 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

The Role of Dimensionality Reduction in Classification

Wang, Weiran (TTI Chicago) | Carreira-Perpinan, Miguel Angel (University of California, Merced)

AAAI ConferencesJul-14-2014

Dimensionality reduction (DR) is often used as a preprocessing step in classification, but usually one first fixes the DR mapping, possibly using label information, and then learns a classifier (a filter approach). Best performance would be obtained by optimizing the classification error jointly over DR mapping and classifier (a wrapper approach), but this is a difficult nonconvex problem, particularly with nonlinear DR. Using the method of auxiliary coordinates, we give a simple, efficient algorithm to train a combination of nonlinear DR and a classifier, and apply it to a RBF mapping with a linear SVM. This alternates steps where we train the RBF mapping and a linear SVM as usual regression and classification, respectively, with a closed-form step that coordinates both. The resulting nonlinear low-dimensional classifier achieves classification errors competitive with the state-of-the-art but is fast at training and testing, and allows the user to trade off runtime for classification accuracy easily. We then study the role of nonlinear DR in linear classification, and the interplay between the DR mapping, the number of latent dimensions and the number of classes. When trained jointly, the DR mapping takes an extreme role in eliminating variation: it tends to collapse classes in latent space, erasing all manifold structure, and lay out class centroids so they are linearly separable with maximum margin.

algorithm, classification, classifier, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > California > Merced County > Merced (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback

Exact Subspace Clustering in Linear Time

Wang, Shusen (Zhejiang University) | Tu, Bojun (Zhejiang University) | Xu, Congfu (Zhejiang University) | Zhang, Zhihua (Shanghai Jiao Tong University)

AAAI ConferencesJul-14-2014

Subspace clustering is an important unsupervised learning problem with wide applications in computer vision and data analysis. However, the state-of-the-art methods for this problem suffer from high time complexity---quadratic or cubic in $n$ (the number of data instances). In this paper we exploit a data selection algorithm to speedup computation and the robust principal component analysis to strengthen robustness. Accordingly, we devise a scalable and robust subspace clustering method which costs time only linear in $n$. We prove theoretically that under certain mild assumptions our method solves the subspace clustering problem exactly even for grossly corrupted data. Our algorithm is based on very simple ideas, yet it is the only linear time algorithm with noiseless or noisy recovery guarantee. Finally, empirical results verify our theoretical analysis.

algorithm, assumption 1, subspace, (14 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.66)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels

Wang, Qiaojun (Rutgers University) | Zhang, Kai (NEC Laboratories America) | Jiang, Guofei (NEC Laboratories America) | Maric, Ivan (Rutgers University)

AAAI ConferencesJul-14-2014

Semi-supervised kernel design is an essential step for obtaining good predictive performance in semi-supervised learning tasks. In the current literatures, a large family of algorithms builds the new kernel by using the weighted average of predefined base kernels. While optimal weighting schemes have been studied extensively, the choice of base kernels received much less attention. Many methods simply adopt the empirical kernel matrices or its eigenvectors. Such base kernels are computed irrespective of class labels and may not always reflect useful structures in the data. As a result, in case of poor base kernels, the generalization performance can be degraded however hard their weights are tuned. In this paper, we propose to construct high-quality base kernels with the help of label information to globally improve the final target alignment. In particular, we devise label-aware kernel eigenvectors under the framework of semi-supervised eigenfunction extrapolation, which span base kernels that are more useful for learning. Such base kernels are individually better aligned to the learning target, so their mixture will more likely generate a good classifier. Our approach is computationally efficient, and demonstrates encouraging performance in semi-supervised classification and regression.

alignment, eigenvector, kernel, (14 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

Statistical Learning

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Exploiting Competition Relationship for Robust Visual Recognition

MaxSAT by Improved Instance-Specific Algorithm Configuration

Novel Density-Based Clustering Algorithms for Uncertain Data

The Role of Dimensionality Reduction in Classification

Exact Subspace Clustering in Linear Time

Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels