AITopics | Chan, Jeffrey

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)

AAAI ConferencesJul-14-2014

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

feature selection, reconsidering mutual information, statistical significance view

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Genre: Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)

AAAI ConferencesJul-14-2014

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

feature selection, reconsidering mutual information, statistical significance view

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Genre: Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Vinh, Nguyen Xuan (The University of Melbourne) | Chan, Jeffrey (The University of Melbourne) | Bailey, James (The University of Melbourne)

AAAI ConferencesJul-14-2014

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of MI-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current MI-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

feature selection, reconsidering mutual information, statistical significance view

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Genre: Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

Mixed Membership Models for Exploring User Roles in Online Fora

White, Arthur J. (University College Dublin) | Chan, Jeffrey (University of Melbourne) | Hayes, Conor (National University Ireland Galway) | Murphy, Brendan (University College Dublin)

AAAI ConferencesFeb-22-2012

Discussion boards are a form of social media which allow users to discuss topics and exchange information in a complex manner, in a number of different settings. As the popularity of such message boards has increased, communities of users have emerged, and several prominent types of social role have been identified, such as Question Answerer, Celebrity, Discussion Person and Topic Initiator. Recent studies have noted the structural similarity of the egocentric network of users assigned the same role by qualitative criteria. In this paper a methodology is developed with which to cluster together users with similar ego-centric network structures. This is achieved using a mixed membership formulation which allows for the fact that different groups of users may have characteristics in common. The method is then applied to data taken from boards.ie, a medium sized message boards website. Prominent clusters of users are identified and discussed, and illustrative examples of user behaviour provided. The type of interaction, both locally and globally, taking place within forums is examined.

artificial intelligence, social media, social role, (16 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country: Europe > Ireland (0.29)

Technology:

Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Communications > Social Media (0.69)

Add feedback

Reconstruction of Threaded Conversations in Online Discussion Forums

Aumayr, Erik (National University of Ireland, Galway) | Chan, Jeffrey (National University of Ireland, Galway) | Hayes, Conor (National University of Ireland, Galway)

AAAI ConferencesJul-12-2011

Online discussion boards, or Internet forums, are a signiﬁcant part of the Internet. People use Internet forums to post questions, provide advice and participate in discussions. These online conversations are represented as threads, and the conversation trees within these threads are important in understanding the behaviour of online users. Unfortunately, the reply structures of these threads are generally not publicly accessible or not maintained. Hence, in this paper, we introduce an efﬁcient and simple approach to reconstruct the reply structure in threaded conversations. We contrast its accuracy against three baseline algorithms, and show that our algorithm can accurately recreate the in and out degree distributions of forum reply graphs built from the reconstructed reply structures.

artificial intelligence, natural language, vertex, (20 more...)

AAAI Conferences

Fifth International AAAI Conference on Weblogs and Social Media

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Collaboration (0.70)
Information Technology > Data Science > Data Mining (0.70)
(3 more...)

Add feedback

Filters

Collaborating Authors

Chan, Jeffrey

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View

Mixed Membership Models for Exploring User Roles in Online Fora

Reconstruction of Threaded Conversations in Online Discussion Forums