Performance Analysis
Selecting the Best in GANs Family: a Post Selection Inference Framework
Tsai, Yao-Hung Hubert, Yamada, Makoto, Wu, Denny, Salakhutdinov, Ruslan, Takeuchi, Ichiro, Fukumizu, Kenji
"Which Generative Adversarial Networks (GANs) generates the most plausible images?" has been a frequently asked question among researchers. To address this problem, we first propose an \emph{incomplete} U-statistics estimate of maximum mean discrepancy $\mathrm{MMD}_{inc}$ to measure the distribution discrepancy between generated and real images. $\mathrm{MMD}_{inc}$ enjoys the advantages of asymptotic normality, computation efficiency, and model agnosticity. We then propose a GANs analysis framework to select and test the "best" member in GANs family using the Post Selection Inference (PSI) with $\mathrm{MMD}_{inc}$. In the experiments, we adopt the proposed framework on 7 GANs variants and compare their $\mathrm{MMD}_{inc}$ scores.
Automated software vulnerability detection with machine learning
Harer, Jacob A., Kim, Louis Y., Russell, Rebecca L., Ozdemir, Onur, Kosta, Leonard R., Rangamani, Akshay, Hamilton, Lei H., Centeno, Gabriel I., Key, Jonathan R., Ellingwood, Paul M., McConley, Marc W., Opper, Jeffrey M., Chin, Peter, Lazovich, Tomo
Thousands of security vulnerabilities are discovered in production software each year, either reported publicly to the Common Vulnerabilities and Exposures database or discovered internally in proprietary code. Vulnerabilities often manifest themselves in subtle ways that are not obvious to code reviewers or the developers themselves. With the wealth of open source code available for analysis, there is an opportunity to learn the patterns of bugs that can lead to security vulnerabilities directly from data. In this paper, we present a data-driven approach to vulnerability detection using machine learning, specifically applied to C and C++ programs. We first compile a large dataset of hundreds of thousands of open-source functions labeled with the outputs of a static analyzer. We then compare methods applied directly to source code with methods applied to artifacts extracted from the build process, finding that source-based models perform better. We also compare the application of deep neural network models with more traditional models such as random forests and find the best performance comes from combining features learned by deep models with tree-based models. Ultimately, our highest performing model achieves an area under the precision-recall curve of 0.49 and an area under the ROC curve of 0.87.
Sophos' Intercept X dives into deep learning for security
Next-generation endpoint security provider Sophos is taking advanced deep learning neural networks to the fight against malware through the release of a new detection tool called Intercept X. According to the company, deep learning takes machine learning to the next level by being able to learn the entire observable threat landscape. It is also able to process many millions of samples for a faster prediction rate and fewer false positives. According to Enterprise Strategy Group senior validation analyst Tony Palmer, traditional machine learning models still depend on expert threat analysts for training; they also get more complex and slower as more data is added. "These models may also have significant false positive rates which reduce IT productivity as admins try to determine what is malware and what is legitimate software," Palmer explains.
Unsupervised Evaluation and Weighted Aggregation of Ranked Predictions
Ahsen, Mehmet Eren, Vogel, Robert, Stolovitzky, Gustavo
Learning algorithms that aggregate predictions from an ensemble of diverse base classifiers consistently outperform individual methods. Many of these strategies have been developed in a supervised setting, where the accuracy of each base classifier can be empirically measured and this information is incorporated in the training process. However, the reliance on labeled data precludes the application of ensemble methods to many real world problems where labeled data has not been curated. To this end we developed a new theoretical framework for binary classification, the Strategy for Unsupervised Multiple Method Aggregation (SUMMA), to estimate the performances of base classifiers and an optimal strategy for ensemble learning from unlabeled data.
Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
Imamura, Hideaki, Sato, Issei, Sugiyama, Masashi
While crowdsourcing has become an important means to label data, crowdworkers are not always experts---sometimes they can even be adversarial. Therefore, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on, e.g., class priors, confusion matrices, and the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models that includes the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. Note that the wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value.
A comparative study of fairness-enhancing interventions in machine learning
Friedler, Sorelle A., Scheidegger, Carlos, Venkatasubramanian, Suresh, Choudhary, Sonam, Hamilton, Evan P., Roth, Derek
Computers are increasingly used to make decisions that have significant impact in people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers and predictors have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions. Concretely, we present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures, and a large number of existing datasets. We find that although different algorithms tend to prefer specific formulations of fairness preservations, many of these measures strongly correlate with one another. In addition, we find that fairness-preserving algorithms tend to be sensitive to fluctuations in dataset composition (simulated in our benchmark by varying training-test splits), indicating that fairness interventions might be more brittle than previously thought.
Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning
Although aviation accidents are rare, safety incidents occur more frequently and require a careful analysis to detect and mitigate risks in a timely manner. Analyzing safety incidents using operational data and producing event-based explanations is invaluable to airline companies as well as to governing organizations such as the Federal Aviation Administration (FAA) in the United States. However, this task is challenging because of the complexity involved in mining multi-dimensional heterogeneous time series data, the lack of time-step-wise annotation of events in a flight, and the lack of scalable tools to perform analysis over a large number of events. In this work, we propose a precursor mining algorithm that identifies events in the multidimensional time series that are correlated with the safety incident. Precursors are valuable to systems health and safety monitoring and in explaining and forecasting safety incidents. Current methods suffer from poor scalability to high dimensional time series data and are inefficient in capturing temporal behavior. We propose an approach by combining multiple-instance learning (MIL) and deep recurrent neural networks (DRNN) to take advantage of MIL's ability to learn using weakly supervised data and DRNN's ability to model temporal behavior. We describe the algorithm, the data, the intuition behind taking a MIL approach, and a comparative analysis of the proposed algorithm with baseline models. We also discuss the application to a real-world aviation safety problem using data from a commercial airline company and discuss the model's abilities and shortcomings, with some final remarks about possible deployment directions.
Hybrid Decision Making: When Interpretable Models Collaborate With Black-Box Models
Interpretable machine learning models have received increasing interest in recent years, especially in domains where humans are involved in the decision-making process. However, the possible loss of the task performance for gaining interpretability is often inevitable. This performance downgrade puts practitioners in a dilemma of choosing between a top-performing black-box model with no explanations and an interpretable model with unsatisfying task performance. In this work, we propose a novel framework for building a Hybrid Decision Model that integrates an interpretable model with any black-box model to introduce explanations in the decision making process while preserving or possibly improving the predictive accuracy. We propose a novel metric, explainability, to measure the percentage of data that are sent to the interpretable model for decision. We also design a principled objective function that considers predictive accuracy, model interpretability, and data explainability. Under this framework, we develop Collaborative Black-box and RUle Set Hybrid (CoBRUSH) model that combines logic rules and any black-box model into a joint decision model. An input instance is first sent to the rules for decision. If a rule is satisfied, a decision will be directly generated. Otherwise, the black-box model is activated to decide on the instance. To train a hybrid model, we design an efficient search algorithm that exploits theoretically grounded strategies to reduce computation. Experiments show that CoBRUSH models are able to achieve same or better accuracy than their black-box collaborator working alone while gaining explainability. They also have smaller model complexity than interpretable baselines.
Post-Regularization Inference for Time-Varying Nonparanormal Graphical Models
Lu, Junwei, Kolar, Mladen, Liu, Han
We propose a novel class of time-varying nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model, we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing procedures are based on a high dimensional, debiasing-free moment estimator, which uses a novel kernel smoothed Kendall's tau correlation matrix as an input statistic. The estimator consistently estimates the latent inverse Pearson correlation matrix uniformly in both the index variable and kernel bandwidth. Its rate of convergence is shown to be minimax optimal. Our method is supported by thorough numerical simulations and an application to a neural imaging data set.
Maturation Trajectories of Cortical Resting-State Networks Depend on the Mediating Frequency Band
Khan, Sheraz, Hashmi, Javeria, Mamashli, Fahimeh, Michmizos, Konstantinos, Kitzbichler, Manfred, Bharadwaj, Hari, Bekhti, Yousra, Ganesan, Santosh, Garel, Keri A, Whitfield-Gabrieli, Susan, Gollub, Randy, Kong, Jian, Vaina, Lucia M, Rana, Kunjan, Stufflebeam, Steven, Hamalainen, Matti, Kenet, Tal
The functional significance of resting state networks and their abnormal manifestations in psychiatric disorders are firmly established, as is the importance of the cortical rhythms in mediating these networks. Resting state networks are known to undergo substantial reorganization from childhood to adulthood, but whether distinct cortical rhythms, which are generated by separable neural mechanisms and are often manifested abnormally in psychiatric conditions, mediate maturation differentially, remains unknown. Using magnetoencephalography (MEG) to map frequency band specific maturation of resting state networks from age 7 to 29 in 162 participants (31 independent), we found significant changes with age in networks mediated by the beta (13-30Hz) and gamma (31-80Hz) bands. More specifically, gamma band mediated networks followed an expected asymptotic trajectory, but beta band mediated networks followed a linear trajectory. Network integration increased with age in gamma band mediated networks, while local segregation increased with age in beta band mediated networks. Spatially, the hubs that changed in importance with age in the beta band mediated networks had relatively little overlap with those that showed the greatest changes in the gamma band mediated networks. These findings are relevant for our understanding of the neural mechanisms of cortical maturation, in both typical and atypical development.