Accuracy
Minimax Estimation of Conditional Moment Models
Dikkala, Nishanth, Lewis, Greg, Mackey, Lester, Syrgkanis, Vasilis
We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for ill-posed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem (e.g. stochastic first-order heuristics for neural networks). In several applications, we show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates. We conclude with an extensive experimental analysis of the proposed methods.
Smartphone Transportation Mode Recognition Using a Hierarchical Machine Learning Classifier and Pooled Features From Time and Frequency Domains
Ashqar, Huthaifa I., Almannaa, Mohammed H., Elhenawy, Mohammed, Rakha, Hesham A., House, Leanna
This paper develops a novel two-layer hierarchical classifier that increases the accuracy of traditional transportation mode classification algorithms. This paper also enhances classification accuracy by extracting new frequency domain features. Many researchers have obtained these features from global positioning system data; however, this data was excluded in this paper, as the system use might deplete the smartphone's battery and signals may be lost in some areas. Our proposed two-layer framework differs from previous classification attempts in three distinct ways: 1) the outputs of the two layers are combined using Bayes' rule to choose the transportation mode with the largest posterior probability; 2) the proposed framework combines the new extracted features with traditionally used time domain features to create a pool of features; and 3) a different subset of extracted features is used in each layer based on the classified modes. Several machine learning techniques were used, including k-nearest neighbor, classification and regression tree, support vector machine, random forest, and a heterogeneous framework of random forest and support vector machine. Results show that the classification accuracy of the proposed framework outperforms traditional approaches. Transforming the time domain features to the frequency domain also adds new features in a new space and provides more control on the loss of information. Consequently, combining the time domain and the frequency domain features in a large pool and then choosing the best subset results in higher accuracy than using either domain alone. The proposed two-layer classifier obtained a maximum classification accuracy of 97.02%.
Parametric Programming Approach for More Powerful and General Lasso Selective Inference
Duy, Vo Nguyen Le, Takeuchi, Ichiro
Selective Inference (SI) has been actively studied in the past few years for conducting inference on the features of linear models that are adaptively selected by feature selection methods such as Lasso. The basic idea of SI is to make inference conditional on the selection event. Unfortunately, the main limitation of the original SI approach for Lasso, proposed in the seminal work by Lee et al. \cite{lee2016exact}, is that the inference is conducted not only conditional on the selected features but also on their signs---this leads to loss of power because of over-conditioning. Although this limitation can be circumvented by considering the union of such selection events for all possible combinations of signs, this is only feasible when the number of selected features is sufficiently small. To address this computational bottleneck, we propose a parametric programming-based method that can conduct SI without conditioning on signs even when we have thousands of active features. The main idea is to compute the continuum path of Lasso solutions in the direction of a test statistic, and identify the subset of the data space corresponding to the feature selection event by following the solution path. The proposed parametric programming-based method not only avoids the aforementioned computational bottleneck but also improves the performance and practicality of SI for Lasso in various respects. We conduct several experiments to demonstrate the effectiveness and efficiency of our proposed method.
Hybrid Attentional Memory Network for Computational drug repositioning
He, Jieyue, Yang, Xinxing, Gong, Zhuo, Zamit, lbrahim
Drug repositioning is designed to discover new uses of known drugs, which is an important and efficient method of drug discovery. Researchers only use one certain type of Collaborative Filtering (CF) models for drug repositioning currently, like the neighborhood based approaches which are good at mining the local information contained in few strong drug-disease associations, or the latent factor based models which are effectively capture the global information shared by a majority of drug-disease associations. Few researchers have combined these two types of CF models to derive a hybrid model with the advantages of both of them. Besides, the cold start problem has always been a major challenge in the field of computational drug repositioning, which restricts the inference ability of relevant models. Inspired by the memory network, we propose the Hybrid Attentional Memory Network (HAMN) model, a deep architecture combines two classes of CF model in a nonlinear manner. Firstly, the memory unit and the attention mechanism are combined to generate the neighborhood contribution representation to capture the local structure of few strong drug-disease associations. Then a variant version of the autoencoder is used to extract the latent factor of drugs and diseases to capture the overall information shared by a majority of drug-disease associations. In that process, ancillary information of drugs and diseases can help to alleviate the cold start problem. Finally, in the prediction stage, the neighborhood contribution representation is combined with the drug latent factor and disease latent factor to produce the predicted value. Comprehensive experimental results on two real data sets show that our proposed HAMN model is superior to other comparison models according to the AUC, AUPR and HR indicators.
Adaptive Sampling to Reduce Disparate Performance
Abernethy, Jacob, Awasthi, Pranjal, Kleindessner, Matthรคus, Morgenstern, Jamie, Zhang, Jie
Existing methods for reducing disparate performance of a classifier across different demographic groups assume that one has access to a large data set, thereby focusing on the algorithmic aspect of optimizing overall performance subject to additional constraints. However, poor data collection and imbalanced data sets can severely affect the quality of these methods. In this work, we consider a setting where data collection and optimization are performed simultaneously. In such a scenario, a natural strategy to mitigate the performance difference of the classifier is to provide additional training data drawn from the demographic groups that are worse off. In this paper, we propose to consistently follow this strategy throughout the whole training process and to guide the resulting classifier towards equal performance on the different groups by adaptively sampling each data point from the group that is currently disadvantaged. We provide a rigorous theoretical analysis of our approach in a simplified one-dimensional setting and an extensive experimental evaluation on numerous real-world data sets, including a case study on the data collected during the Flint water crisis.
NADS: Neural Architecture Distribution Search for Uncertainty Awareness
Ardywibowo, Randy, Boluki, Shahin, Gong, Xinyu, Wang, Zhangyang, Qian, Xiaoning
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. It becomes important for ML systems in critical applications to accurately quantify its predictive uncertainty and screen out these anomalous inputs. However, existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples. Unlike standard learning tasks, there is currently no well established guiding principle for designing OoD detection architectures that can accurately quantify uncertainty. To address these problems, we first seek to identify guiding principles for designing uncertainty-aware architectures, by proposing Neural Architecture Distribution Search (NADS). NADS searches for a distribution of architectures that perform well on a given task, allowing us to identify common building blocks among all uncertainty-aware architectures. With this formulation, we are able to optimize a stochastic OoD detection objective and construct an ensemble of models to perform OoD detection. We perform multiple OoD detection experiments and observe that our NADS performs favorably, with up to 57% improvement in accuracy compared to state-of-the-art methods among 15 different testing configurations.
On mistakes we made in prior Computational Psychiatry Data driven approach projects and how they jeopardize translation of those findings in clinical practice
Radenkoviฤ, Milena ฤukiฤ, Pokrajac, David, Lopez, Victoria
In this work we aimed at comparing our findings in depression detection task with methodologies applied in present literature. Previously we showed that when electrophysiological signal (in this case electroencephalogram, EEG) is characterized by nonlinear measures, any of seven most popular classifiers yields high accuracy on the task. Following every step we done in this process we compare it with other researchers' practice and comment on other findings mainly from analysis of electrical signals or nonlinear analysis showing what would be optimal for further research. We focused on discussing various mistakes and differences that could potentially lead to unwarranted optimism and other misinterpretation of results. In Conclusion we summarize recommendation for future research in order to be applicable in clinical practice. Introduction Current clinical psychiatry is lacking objective biochemical or electrophysiological tests used for diagnosis unlike other medical disciplines. To diagnose depression, clinician will typically rely on the self-report from the patient and his experience in applying DSM manual, which is standardized list of symptoms to be checked in every case (in order to be qualified as a certain disorder). It is perfectly possible that two persons diagnosed with the same disorder have not overlapping symptoms, and that one person can have two distinct diagnosis. If someone has more than three episodes of depression, that is considered to be recurrent depression (after every episode the probability of the next one is doubling). This is particularly heard to treat and manage therapy which is ongoing through person's whole life. Apart from obsolete diagnostic, all antidepressants have serious side-effects, the waiting lists are very long (in Nederland they are between 6 and 9 months long) and the therapy can last for years or even decades. It is reported than only 11 - 30% of patients are improving in the first year of therapy (Rush et al., 2008).
A Variational Approach to Privacy and Fairness
Rodrรญguez-Gรกlvez, Borja, Thobaben, Ragnar, Skoglund, Mikael
In this article, we propose a new variational approach to learn private and/or fair representations. This approach is based on the Lagrangians of a new formulation of the privacy and fairness optimization problems that we propose. In this formulation, we aim at generating representations of the data that keep a prescribed level of the relevant information that is not shared by the private or sensitive data, while minimizing the remaining information they keep. The proposed approach (i) exhibits the similarities of the privacy and fairness problems, (ii) allows us to control the trade-off between utility and privacy or fairness through the Lagrange multiplier parameter, and (iii) can be comfortably incorporated to common representation learning algorithms such as the VAE, the $\beta$-VAE, the VIB, or the nonlinear IB.
ClarQ: A large-scale and diverse dataset for Clarification Question Generation
Kumar, Vaibhav, black, Alan W.
Question answering and conversational systems are often baffled and need help clarifying certain ambiguities. However, limitations of existing datasets hinder the development of large-scale models capable of generating and utilising clarification questions. In order to overcome these limitations, we devise a novel bootstrapping framework (based on self-supervision) that assists in the creation of a diverse, large-scale dataset of clarification questions based on post-comment tuples extracted from stackexchange. The framework utilises a neural network based architecture for classifying clarification questions. It is a two-step method where the first aims to increase the precision of the classifier and second aims to increase its recall. We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering. The final dataset, ClarQ, consists of ~2M examples distributed across 173 domains of stackexchange. We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
Coronavirus: How air passengers can stay safe
Thermal-imaging cameras and swab tests for coronavirus are not "clinically valuable" in airports, according to a panel of aviation health experts. About one in every three infectious people would be missed, they say. Air systems and low humidity on planes already reduces virus spread through the cabin. But passengers should wear face coverings at all times, board and disembark one row at a time and be seated apart from others if possible. And those seated at the back should be the first on and last off.