Accuracy
Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent
Zhao, Xiaosa, Liu, Kunpeng, Fan, Wei, Jiang, Lu, Zhao, Xiaowei, Yin, Minghao, Fu, Yanjie
Feature selection aims to select a subset of features to optimize the performances of downstream predictive tasks. Recently, multi-agent reinforced feature selection (MARFS) has been introduced to automate feature selection, by creating agents for each feature to select or deselect corresponding features. Although MARFS enjoys the automation of the selection process, MARFS suffers from not just the data complexity in terms of contents and dimensionality, but also the exponentially-increasing computational costs with regard to the number of agents. The raised concern leads to a new research question: Can we simplify the selection process of agents under reinforcement learning context so as to improve the efficiency and costs of feature selection? To address the question, we develop a single-agent reinforced feature selection approach integrated with restructured choice strategy. Specifically, the restructured choice strategy includes: 1) we exploit only one single agent to handle the selection task of multiple features, instead of using multiple agents. 2) we develop a scanning method to empower the single agent to make multiple selection/deselection decisions in each round of scanning. 3) we exploit the relevance to predictive labels of features to prioritize the scanning orders of the agent for multiple features. 4) we propose a convolutional auto-encoder algorithm, integrated with the encoded index information of features, to improve state representation. 5) we design a reward scheme that take into account both prediction accuracy and feature redundancy to facilitate the exploration process. Finally, we present extensive experimental results to demonstrate the efficiency and effectiveness of the proposed method.
Group Fairness by Probabilistic Modeling with Latent Fair Decisions
Choi, YooJung, Dang, Meihua, Broeck, Guy Van den
Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved. Moreover, we show on real-world datasets that our approach not only is a better model than existing methods of how the data was generated but also achieves competitive accuracy.
Kernel Ridge Regression Using Importance Sampling with Application to Seismic Response Prediction
Pourkamali-Anaraki, Farhad, Hariri-Ardebili, Mohammad Amin, Morawiec, Lydia
Scalable kernel methods, including kernel ridge regression, often rely on low-rank matrix approximations using the Nystrom method, which involves selecting landmark points from large data sets. The existing approaches to selecting landmarks are typically computationally demanding as they require manipulating and performing computations with large matrices in the input or feature space. In this paper, our contribution is twofold. The first contribution is to propose a novel landmark selection method that promotes diversity using an efficient two-step approach. Our landmark selection technique follows a coarse to fine strategy, where the first step computes importance scores with a single pass over the whole data. The second step performs K-means clustering on the constructed coreset to use the obtained centroids as landmarks. Hence, the introduced method provides tunable trade-offs between accuracy and efficiency. Our second contribution is to investigate the performance of several landmark selection techniques using a novel application of kernel methods for predicting structural responses due to earthquake load and material uncertainties. Our experiments exhibit the merits of our proposed landmark selection scheme against baselines.
MStream: Fast Streaming Multi-Aspect Group Anomaly Detection
Bhatia, Siddharth, Jain, Arjit, Li, Pan, Kumar, Ritesh, Hooi, Bryan
Given a stream of entries in a multi-aspect data setting i.e., entries having multiple dimensions, how can we detect anomalous activities? For example, in the intrusion detection setting, existing work seeks to detect anomalous events or edges in dynamic graph streams, but this does not allow us to take into account additional attributes of each entry. Our work aims to define a streaming multi-aspect data anomaly detection framework, termed MStream, which can detect unusual group anomalies as they occur, in a dynamic manner. MStream has the following properties: (a) it detects anomalies in multi-aspect data including both categorical and numeric attributes; (b) it is online, thus processing each record in constant time and constant memory; (c) it can capture the correlation between multiple aspects of the data. MStream is evaluated over the KDDCUP99, CICIDS-DoS, UNSW-NB 15 and CICIDS-DDoS datasets, and outperforms state-of-the-art baselines.
Algorithmic Fairness in Education
Kizilcec, René F., Lee, Hansol
Data-driven predictive models are increasingly used in education to support students, instructors, and administrators. However, there are concerns about the fairness of the predictions and uses of these algorithmic systems. In this introduction to algorithmic fairness in education, we draw parallels to prior literature on educational access, bias, and discrimination, and we examine core components of algorithmic systems (measurement, model learning, and action) to identify sources of bias and discrimination in the process of developing and deploying these systems. Statistical, similarity-based, and causal notions of fairness are reviewed and contrasted in the way they apply in educational contexts. Recommendations for policy makers and developers of educational technology offer guidance for how to promote algorithmic fairness in education.
Modeling human visual search: A combined Bayesian searcher and saliency map approach for eye movement guidance in natural scenes
Sclar, M., Bujia, G., Vita, S., Solovey, G., Kamienkowski, J. E.
Finding objects is essential for almost any daily-life visual task. Saliency models have been useful to predict fixation locations in natural images, but are static, i.e., they provide no information about the time-sequence of fixations. Nowadays, one of the biggest challenges in the field is to go beyond saliency maps to predict a sequence of fixations related to a visual task, such as searching for a given target. Bayesian observer models have been proposed for this task, as they represent visual search as an active sampling process. Nevertheless, they were mostly evaluated on artificial images, and how they adapt to natural images remains largely unexplored. Here, we propose a unified Bayesian model for visual search guided by saliency maps as prior information. We validated our model with a visual search experiment in natural scenes recording eye movements. We show that, although state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, their performance degrades to chance afterward. This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical. Thus, we propose to use them as priors of Bayesian searchers. This approach leads to a behavior very similar to humans for the whole scanpath, both in the percentage of target found as a function of the fixation rank and the scanpath similarity, reproducing the entire sequence of eye movements.
Real-Time Streaming Anomaly Detection in Dynamic Graphs
Bhatia, Siddharth, Liu, Rui, Hooi, Bryan, Yoon, Minji, Shin, Kijung, Faloutsos, Christos
Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithm's internal states, creating a 'poisoning' effect which can allow future anomalies to slip through undetected. MIDAS-F introduces two modifications: 1) We modify the anomaly scoring function, aiming to reduce the 'poisoning' effect of newly arriving edges; 2) We introduce a conditional merge step, which updates the algorithm's data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the `poisoning' effect. Experiments show that MIDAS-F has significantly higher accuracy than MIDAS. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 130 to 929 times faster than state-of-the-art approaches; (c) it provides 41% to 55% higher accuracy (in terms of ROC-AUC) than state-of-the-art approaches.
Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection
Li, Wenhao, Chen, Ningyuan, Hong, L. Jeff
We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate $\mathbf{x}$ and decision $\mathbf{y}$. The reward function to learn, $f(\mathbf{x},\mathbf{y})$, does not have a particular parametric form. The literature has shown that the optimal regret is $\tilde{O}(T^{(d_x+d_y+1)/(d_x+d_y+2)})$, where $d_x$ and $d_y$ are the dimensions of $\mathbf x$ and $\mathbf y$, and thus it suffers from the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of $f$, which is referred to as \textit{sparsity} in statistics. To take advantage of the sparsity structure of the covariate, we propose a variable selection algorithm called \textit{BV-LASSO}, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings. Our algorithm achieves the regret $\tilde{O}(T^{(d_x^*+d_y+1)/(d_x^*+d_y+2)})$, where $d_x^*$ is the effective covariate dimension. The regret matches the optimal regret when the covariate is $d^*_x$-dimensional and thus cannot be improved. Our algorithm may serve as a general recipe to achieve dimension reduction via variable selection in nonparametric settings.
Ridge Regression Revisited: Debiasing, Thresholding and Bootstrap
Zhang, Yunyi, Politis, Dimitris N.
In high dimensional setting, the facts that the classical ridge regression method cannot perform model selection on its own and it introduces large bias make this method an unsatisfactory tool for analyzing high dimensional linear models. In this paper, we propose the debiased and threshold ridge regression method which solves these drawbacks. Besides, focus on performing statistical inference and prediction of linear combinations of parameters, we provide a normal approximation theorem for the estimator and propose two bootstrap algorithms which provide joint confidence regions and prediction regions for the linear combinations. In statistical inference part, apart from the dimension of parameters, we allow the number of linear combinations to grow as sample size increases. From numerical experiments, we can see that the proposed regression method is robust with the fluctuation in ridge parameter and reduces estimation errors compared to classical and threshold ridge regression methods. Apart from theoretical interests, the proposed algorithms can be applied to disciplines such as econometrics, biology and etc.
StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics
Chatzimparmpas, Angelos, Martins, Rafael M., Kucher, Kostiantyn, Kerren, Andreas
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.