Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks
Ozdenizci, Ozan, Erdogmus, Deniz
Applications in various research fields have developed different domain-specific methods for feature learning and subsequent supervised model training [24, 26, 28]. Many exploratory applications in practice are further characterized by high-dimensional feature representations where the dimensionality reduction problem is to be addressed. One traditional approach towards supervised dimensionality reduction is feature selection, referring to the process of selecting the most class-informative subset from the high-dimensional feature set and discarding others [16]. Particularly, feature selection based on information theoretic criteria (e.g., maximum mutual information) have shown significant promise in earlier studies [2, 25]. Although selecting a class-relevant subset of features leads to intuitively interpretable and preferable learning algorithms, feature ranking and selection algorithms are known to potentially yield sub-optimal solutions due to their inability to thoroughly assess feature dependencies [10, 44]. In that regard, feature transformation based dimensionality reduction methods provide a more robust alternative [16], which have been also studied in the form of information theoretic projections or rotations [11, 19, 43].
May-1-2021
- Country:
- North America > United States
- Wisconsin (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Europe > Austria
- North America > United States
- Genre:
- Research Report (0.64)
- Industry:
- Technology: