In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed.
We present a novel feature selection algorithm for the $k$-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter $\epsilon \in (0,1)$, selects and appropriately rescales in an unsupervised manner $\Theta(k \log(k / \epsilon) / \epsilon^2)$ features from a dataset of arbitrary dimensions. We prove that, if we run any $\gamma$-approximate $k$-means algorithm ($\gamma \geq 1$) on the features selected using our method, we can find a $(1+(1+\epsilon)\gamma)$-approximate partition with high probability.
Feature selection is effective in preparing high-dimensional data for a variety of learning tasks such as classification, clustering and anomaly detection. A vast majority of existing feature selection methods assume that all instances share some common patterns manifested in a subset of shared features. However, this assumption is not necessarily true in many domains where data instances could show high individuality. For example, in the medical domain, we need to capture the heterogeneous nature of patients for personalized predictive modeling, which could be characterized by a subset of instance-specific features. Motivated by this, we propose to study a novel problem of personalized feature selection. In particular, we investigate the problem in an unsupervised scenario as label information is usually hard to obtain in practice. To be specific, we present a novel unsupervised personalized feature selection framework UPFS to find some shared features by all instances and instance-specific features tailored to each instance. We formulate the problem into a principled optimization framework and provide an effective algorithm to solve it. Experimental results on real-world datasets verify the effectiveness of the proposed UPFS framework.
Highly non-linear machine learning algorithms have the capacity to handle large, complex datasets. However, the predictive performance of a model usually critically depends on the choice of multiple hyperparameters. Optimizing these (often) constitutes an expensive black-box problem. Model-based optimization is one state-of-the-art method to address this problem. Furthermore, resulting models often lack interpretability, as models usually contain many active features with non-linear effects and higher-order interactions. One model-agnostic way to enhance interpretability is to enforce sparse solutions through feature selection. It is in many applications desirable to forego a small drop in performance for a substantial gain in sparseness, leading to a natural treatment of feature selection as a multi-objective optimization task. Despite the practical relevance of both hyperparameter optimization and feature selection, they are often carried out separately from each other, which is neither efficient, nor does it take possible interactions between hyperparameters and selected features into account. We present, discuss and compare two algorithmically different approaches for joint and multi-objective hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization to tune a feature filter ensemble. The second is an evolutionary NSGA-II-based wrapper-approach to feature selection which incorporates specialized sampling, mutation and recombination operators for the joint decision space of included features and hyperparameter settings. We compare and discuss the approaches on a variety of benchmark tasks. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs significant overhead compared to the NSGA-II-based approach. The preferred choice depends on the cost of training the ML model on the given data.
In an earlier post, I discussed a model agnostic feature selection technique called forward feature selection which basically extracted the most important features required for the optimal value of chosen KPI. It had one caveat though -- large time complexity. In order to circumvent that issue feature importance can directly be obtained from the model being trained. In this post, I will consider 2 classification and 1 regression algorithms to explain model-based feature importance in detail. An inherently binary classification algorithm, it tries to find the best hyperplane in k-dimensional space that separates the 2 classes, minimizing logistic loss.