Sricharan, Kumar
Building robust classifiers through generation of confident out of distribution examples
Sricharan, Kumar, Srivastava, Ashok
Deep learning models are known to be overconfident in their predictions on out of distribution inputs. There have been several pieces of work to address this issue, including a number of approaches for building Bayesian neural networks, as well as closely related work on detection of out of distribution samples. Recently, there has been work on building classifiers that are robust to out of distribution samples by adding a regularization term that maximizes the entropy of the classifier output on out of distribution data. To approximate out of distribution samples (which are not known apriori), a GAN was used for generation of samples at the edges of the training distribution. In this paper, we introduce an alternative GAN based approach for building a robust classifier, where the idea is to use the GAN to explicitly generate out of distribution samples that the classifier is confident on (low entropy), and have the classifier maximize the entropy for these samples.
Improving robustness of classifiers by training against live traffic
Sricharan, Kumar, Kallurupalli, Kumar, Srivastava, Ashok
Deep learning models are known to be overconfident in their predictions on out of distribution inputs. This is a challenge when a model is trained on a particular input dataset, but receives out of sample data when deployed in practice. Recently, there has been work on building classifiers that are robust to out of distribution samples by adding a regularization term that maximizes the entropy of the classifier output on out of distribution data. However, given the challenge that it is not always possible to obtain out of distribution samples, the authors suggest a GAN based alternative that is independent of specific knowledge of out of distribution samples. From this existing work, we also know that having access to the true out of sample distribution for regularization works significantly better than using samples from the GAN. In this paper, we make the following observation: in practice, the out of distribution samples are contained in the traffic that hits a deployed classifier. However, the traffic will also contain a unknown proportion of in-distribution samples. If the entropy over of all of the traffic data were to be naively maximized, this will hurt the classifier performance on in-distribution data. To effectively leverage this traffic data, we propose an adaptive regularization technique (based on the maximum predictive probability score of a sample) which penalizes out of distribution samples more heavily than in distribution samples in the incoming traffic. This ensures that the overall performance of the classifier does not degrade on in-distribution data, while detection of out-of-distribution samples is significantly improved by leveraging the unlabeled traffic data. We show the effectiveness of our method via experiments on natural image datasets.
ExprGAN: Facial Expression Editing With Controllable Expression Intensity
Ding, Hui (University of Maryland, College Park) | Sricharan, Kumar (PARC, Palo Alto) | Chellappa, Rama (University of Maryland, College Park)
Facial expression editing is a challenging task as it needs a high-level semantic understanding of the input face image. In conventional methods, either paired training data is required or the synthetic faceโs resolution is low. Moreover,only the categories of facial expression can be changed. To address these limitations, we propose an Expression Generative Adversarial Network (ExprGAN) for photo-realistic facial expression editing with controllable expression intensity. An expression controller module is specially designed to learn an expressive and compact expression code in addition to the encoder-decoder network. This novel architecture enables the expression intensity to be continuously adjusted from low to high. We further show that our ExprGAN can be applied for other tasks, such as expression transfer, image retrieval, and data augmentation for training improved face expression recognition models. To tackle the small size of the training database, an effective incremental learning scheme is proposed. Quantitative and qualitative evaluations on the widely used Oulu-CASIA dataset demonstrate the effectiveness of ExprGAN.
Latent Laplacian Maximum Entropy Discrimination for Detection of High-Utility Anomalies
Hou, Elizabeth, Sricharan, Kumar, Hero, Alfred O.
Data-driven anomaly detection methods suffer from the drawback of detecting all instances that are statistically rare, irrespective of whether the detected instances have real-world significance or not. In this paper, we are interested in the problem of specifically detecting anomalous instances that are known to have high real-world utility, while ignoring the low-utility statistically anomalous instances. To this end, we propose a novel method called Latent Laplacian Maximum Entropy Discrimination (LatLapMED) as a potential solution. This method uses the EM algorithm to simultaneously incorporate the Geometric Entropy Minimization principle for identifying statistical anomalies, and the Maximum Entropy Discrimination principle to incorporate utility labels, in order to detect high-utility anomalies. We apply our method in both simulated and real datasets to demonstrate that it has superior performance over existing alternatives that independently pre-process with unsupervised anomaly detection algorithms before classifying.
Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence
Anderson, Monica (University of Alabama) | Bartรกk, Roman (Charles University) | Brownstein, John S. (Boston Children's Hospital, Harvard University) | Buckeridge, David L. (McGill University) | Eldardiry, Hoda (Palo Alto Research Center) | Geib, Christopher (Drexel University) | Gini, Maria (University of Minnesota) | Isaksen, Aaron (New York University) | Keren, Sarah (Technion University) | Laddaga, Robert (Vanderbilt University) | Lisy, Viliam (Czech Technical University) | Martin, Rodney (NASA Ames Research Center) | Martinez, David R. (MIT Lincoln Laboratory) | Michalowski, Martin (University of Ottawa) | Michael, Loizos (Open University of Cyprus) | Mirsky, Reuth (Ben-Gurion University) | Nguyen, Thanh (University of Michigan) | Paul, Michael J. (University of Colorado Boulder) | Pontelli, Enrico (New Mexico State University) | Sanner, Scott (University of Toronto) | Shaban-Nejad, Arash (University of Tennessee) | Sinha, Arunesh (University of Michigan) | Sohrabi, Shirin (IBM T. J. Watson Research Center) | Sricharan, Kumar (Palo Alto Research Center) | Srivastava, Biplav (IBM T. J. Watson Research Center) | Stefik, Mark (Palo Alto Research Center) | Streilein, William W. (MIT Lincoln Laboratory) | Sturtevant, Nathan (University of Denver) | Talamadupula, Kartik (IBM T. J. Watson Research Center) | Thielscher, Michael (University of New South Wales) | Togelius, Julian (New York University) | Tran, So Cao (New Mexico State University) | Tran-Thanh, Long (University of Southampton) | Wagner, Neal (MIT Lincoln Laboratory) | Wallace, Byron C. (Northeastern University) | Wilk, Szymon (Poznan University of Technology) | Zhu, Jichen (Drexel University)
Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence
Anderson, Monica (University of Alabama) | Bartรกk, Roman (Charles University) | Brownstein, John S. (Boston Children's Hospital, Harvard University) | Buckeridge, David L. (McGill University) | Eldardiry, Hoda (Palo Alto Research Center) | Geib, Christopher (Drexel University) | Gini, Maria (University of Minnesota) | Isaksen, Aaron (New York University) | Keren, Sarah (Technion University) | Laddaga, Robert (Vanderbilt University) | Lisy, Viliam (Czech Technical University) | Martin, Rodney (NASA Ames Research Center) | Martinez, David R. (MIT Lincoln Laboratory) | Michalowski, Martin (University of Ottawa) | Michael, Loizos (Open University of Cyprus) | Mirsky, Reuth (Ben-Gurion University) | Nguyen, Thanh (University of Michigan) | Paul, Michael J. (University of Colorado Boulder) | Pontelli, Enrico (New Mexico State University) | Sanner, Scott (University of Toronto) | Shaban-Nejad, Arash (University of Tennessee) | Sinha, Arunesh (University of Michigan) | Sohrabi, Shirin (IBM T. J. Watson Research Center) | Sricharan, Kumar (Palo Alto Research Center) | Srivastava, Biplav (IBM T. J. Watson Research Center) | Stefik, Mark (Palo Alto Research Center) | Streilein, William W. (MIT Lincoln Laboratory) | Sturtevant, Nathan (University of Denver) | Talamadupula, Kartik (IBM T. J. Watson Research Center) | Thielscher, Michael (University of New South Wales) | Togelius, Julian (New York University) | Tran, So Cao (New Mexico State University) | Tran-Thanh, Long (University of Southampton) | Wagner, Neal (MIT Lincoln Laboratory) | Wallace, Byron C. (Northeastern University) | Wilk, Szymon (Poznan University of Technology) | Zhu, Jichen (Drexel University)
The AAAI-17 workshop program included 17 workshops covering a wide range of topics in AI. Workshops were held Sunday and Monday, February 4-5, 2017 at the Hilton San Francisco Union Square in San Francisco, California, USA. This report contains summaries of 12 of the workshops, and brief abstracts of the remaining 5
Semi-supervised Conditional GANs
Sricharan, Kumar, Bala, Raja, Shreve, Matthew, Ding, Hui, Saketh, Kumar, Sun, Jin
We introduce a new model for building conditional generative models in a semi-supervised setting to conditionally generate data given attributes by adapting the GAN framework. The proposed semi-supervised GAN (SS-GAN) model uses a pair of stacked discriminators to learn the marginal distribution of the data, and the conditional distribution of the attributes given the data respectively. In the semi-supervised setting, the marginal distribution (which is often harder to learn) is learned from the labeled + unlabeled data, and the conditional distribution is learned purely from the labeled data. Our experimental results demonstrate that this model performs significantly better compared to existing semi-supervised conditional GAN models.
Graph Analysis for Detecting Fraud, Waste, and Abuse in Healthcare Data
Liu, Juan (Medallia) | Bier, Eric (Palo Alto Research Center) | Wilson, Aaron (Palo Alto Research Center) | Guerra-Gomez, John Alexis (Yahoo Labs) | Honda, Tomonori (Inflection.com) | Sricharan, Kumar (Palo Alto Research Center) | Gilpin, Leilani (Massachusetts Institute for Technology) | Davies, Daniel (Palo Alto Research Center)
Detection of fraud, waste, and abuse (FWA) is an important yet challenging problem. In this article, we describe a system to detect suspicious activities in large healthcare datasets. Each healthcare dataset is viewed as a heterogeneous network consisting of millions of patients, hundreds of thousands of doctors, tens of thousands of pharmacies, and other entities. Graph analysis techniques are developed to find suspicious individuals, suspicious relationships between individuals, unusual changes over time, unusual geospatial dispersion, and anomalous network structure.
Graph Analysis for Detecting Fraud, Waste, and Abuse in Healthcare Data
Liu, Juan (Medallia) | Bier, Eric (Palo Alto Research Center) | Wilson, Aaron (Palo Alto Research Center) | Guerra-Gomez, John Alexis (Yahoo Labs) | Honda, Tomonori (Inflection.com) | Sricharan, Kumar (Palo Alto Research Center) | Gilpin, Leilani (Massachusetts Institute for Technology) | Davies, Daniel (Palo Alto Research Center)
Healthcare-related programs include federal and series of technical challenges. From a data representation state government programs such as Medicaid, view, healthcare data sets are often large and Medicare Advantage (Part C), Medicare FFS, and diverse. It is common to see a state's Medicaid program Medicare Prescription Drug Benefit (Part D). Nonhealth-care or a private healthcare insurance program having programs include Earned Income Tax hundreds of millions of claims per year, involving Credit (EITC), Pell Grants, Public Housing/Rental millions of patients and hundreds of thousands of Assistance, Retirement, Survivors and Disability Insurance providers of various types, for example, physicians, (RSDI), School Lunch, Supplemental Nutrition pharmacies, clinics and hospitals, and laboratories. Assistance Program (SNAP), Supplemental Security Any fraud-detection system needs to be able to handle Income (SSI), Unemployment Insurance (UI), and the large data volume and data diversity. While healthcare data (insurance claims, health Data patterns from both sides are dynamic. The complexity records, clinical data, provider information, and others) of the problem calls for a rich set of techniques offers tantalizing opportunities, it also poses a to examine healthcare data. Healthcare financials are complex, involving a from a suspicious individual or activity (as singled multitude of providers (physicians, pharmacies, clinics out by the automated screening components) and and hospitals, and laboratories), payers (insurance interacts with the system to navigate through data plans), and patients. To design a good fraud-detection items and collect evidence to build an investigation system, one must have a deep understanding of the case. The two categories have quite different technical financial incentives of all parties. Starting from database indexing/caching for fast data retrieval and domain knowledge, auditors and investigators have user interface design for intuitive user-system interaction.
Ensemble weighted kernel estimators for multivariate entropy estimation
Sricharan, Kumar, Hero, Alfred O.
The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, kernel plug-in estimators suffer from the curse of dimensionality, wherein the MSE rate of convergence is glacially slow - of order $O(T^{-{\gamma}/{d}})$, where $T$ is the number of samples, and $\gamma>0$ is a rate parameter. In this paper, it is shown that for sufficiently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order $O(T^{-1})$. Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed offline. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suffer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates.