Accuracy
Analyzing Wikipedia Membership Dataset and PredictingUnconnected Nodes in the Signed Networks
Wu, Zhihao, Li, Taoran, Roman, Ray
In the age of digital interaction, person-to-person relationships existing on social media may be different from the very same interactions that exist offline. Examining potential or spurious relationships between members in a social network is a fertile area of research for computer scientists -- here we examine how relationships can be predicted between two unconnected people in a social network by using area under Precison-Recall curve and ROC. Modeling the social network as a signed graph, we compare Triadic model,Latent Information model and Sentiment model and use them to predict peer to peer interactions, first using a plain signed network, and second using a signed network with comments as context. We see that our models are much better than random model and could complement each other in different cases.
The $f$-divergence and Loss Functions in ROC Curve
Given two data distributions and a test score function, the Receiver Operating Characteristic (ROC) curve shows how well such a score separates two distributions. However, can the ROC curve be used as a measure of discrepancy between two distributions? This paper shows that when the data likelihood ratio is used as the test score, the arc length of the ROC curve gives rise to a novel $f$-divergence measuring the differences between two data distributions. Approximating this arc length using a variational objective and empirical samples leads to empirical risk minimization with previously unknown loss functions. We provide a Lagrangian dual objective and introduce kernel models into the estimation problem. We study the non-parametric convergence rate of this estimator and show under mild smoothness conditions of the real arctangent density ratio function, the rate of convergence is $O_p(n^{-\beta/4})$ ($\beta \in (0,1]$ depends on the smoothness).
Model evaluation, model selection, and algorithm selection in machine learning
A single-PDF version of Model Evaluation parts 1-4 is available on arXiv: https://arxiv.org/abs/1811.12808 Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to specify. These tuning knobs, the so-called hyperparameters, help us control the behavior of machine learning algorithms when optimizing for performance, finding the right balance between bias and variance. Hyperparameter tuning for performance optimization is an art in itself, and there are no hard-and-fast rules that guarantee best performance on a given dataset. In Part I and Part II, we saw different holdout and bootstrap techniques for estimating the generalization performance of a model. We learned about the bias-variance trade-off, and we computed the uncertainty of our estimates. In this third part, we will focus on different methods of cross-validation for model evaluation and model selection. We will use these cross-validation techniques to rank models from several hyperparameter configurations and estimate how well they generalize to independent datasets.
Pedestrian Behavior Prediction for Automated Driving: Requirements, Metrics, and Relevant Features
Herman, Michael, Wagner, Jörg, Prabhakaran, Vishnu, Möser, Nicolas, Ziesche, Hanna, Ahmed, Waleed, Bürkle, Lutz, Kloppenburg, Ernst, Gläser, Claudius
Automated vehicles require a comprehensive understanding of traffic situations to ensure safe and anticipatory driving. In this context, the prediction of pedestrians is particularly challenging as pedestrian behavior can be influenced by multiple factors. In this paper, we thoroughly analyze the requirements on pedestrian behavior prediction for automated driving via a system-level approach. To this end we investigate real-world pedestrian-vehicle interactions with human drivers. Based on human driving behavior we then derive appropriate reaction patterns of an automated vehicle and determine requirements for the prediction of pedestrians. This includes a novel metric tailored to measure prediction performance from a system-level perspective. The proposed metric is evaluated on a large-scale dataset comprising thousands of real-world pedestrian-vehicle interactions. We furthermore conduct an ablation study to evaluate the importance of different contextual cues and compare these results to ones obtained using established performance metrics for pedestrian prediction. Our results highlight the importance of a system-level approach to pedestrian behavior prediction.
Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-purpose Regularizer and Privacy Budget Retrieval and Recycling
Empirical risk minimization (ERM) is a principle in statistical learning. Through ERM, we can measure the performance of a family of learning algorithms based on a set of observed training data empirically without knowing the true distribution of the data and derive theoretical bounds on the performance. ERM is routinely applied in a wide range of learning problems such as regression, classification, and clustering. In recent years, with the increasing popularity in privacy-preserving machine learning that satisfies formal privacy guarantees such as differential privacy (DP) [10], the topic of privacy-preserving ERM has also been investigated. Generally speaking, differentially private empirical risk minimization (DP-ERM) can be realized by perturbing the output (estimation or prediction), the objective function (input), or iteratively during the algorithmic optimization, given an ERM problem. For output perturbation, randomization mechanisms need to be applied every time a new output is released; for iterative algorithmic perturbation, each iteration incurs a privacy loss, careful planning and implementation of privacy accounting methods to minimize the overall privacy loss is critical. In this paper, we focus on differentially private perturbation of objective functions. Once an objective function is perturbed, the subsequent optimization does not incur additional privacy loss and all outputs generated from the optimization are also differentially private.
Population modeling with machine learning can enhance measures of mental health
Figure 1 – Figure supplement 1: Learning curves on the random split-half validation used for model building. To facilitate comparisons, we evaluated predictions of age, fluid intelligence and neuroticism from a complete set of socio-demographic variables without brain imaging using the coefficient of determination R2 metric (y-axis) to compare results obtained from 100 to 3000 training samples (x-axis). The cross-validation (CV) distribution was obtained from 100 Monte Carlo splits. Across targets, performance started to plateau after around 1000 training samples with scores virtually identical to the final model used in subsequent analyses. These benchmarks suggest that inclusion of additional training samples would not have led to substantial improvements in performance.
Covid-19: Warning over false negatives and rapid tests allowed for half-term break
New rules allowing travellers returning to England to take lateral flow tests instead of more expensive PCR tests will come into force on 24 October, in time for many families returning from half-term breaks. The government says NHS tests cannot be used for overseas travel but fully vaccinated passengers arriving in England from that date will be able to order tests from approved providers and upload photos of results for verification. Scotland, Wales and Northern Ireland have previously aligned with policy in England.
Adversarial Attacks on ML Defense Models Competition
Dong, Yinpeng, Fu, Qi-An, Yang, Xiao, Xiang, Wenzhao, Pang, Tianyu, Su, Hang, Zhu, Jun, Tang, Jiayu, Chen, Yuefeng, Mao, XiaoFeng, He, Yuan, Xue, Hui, Li, Chao, Liu, Ye, Zhang, Qilong, Gao, Lianli, Yu, Yunrui, Gao, Xitong, Zhao, Zhe, Lin, Daquan, Lin, Jiadong, Song, Chuanbiao, Wang, Zihao, Wu, Zhennan, Guo, Yang, Cui, Jiequan, Xu, Xiaogang, Chen, Pengguang
Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current defense models in image classification, the TSAIL group at Tsinghua University and the Alibaba Security group organized this competition along with a CVPR 2021 workshop on adversarial machine learning (https://aisecure-workshop.github.io/amlcvpr2021/). The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness more effectively and reliably. The participants were encouraged to develop stronger white-box attack algorithms to find the worst-case robustness of different defenses. This competition was conducted on an adversarial robustness evaluation platform -- ARES (https://github.com/thu-ml/ares), and is held on the TianChi platform (https://tianchi.aliyun.com/competition/entrance/531847/introduction) as one of the series of AI Security Challengers Program. After the competition, we summarized the results and established a new adversarial robustness benchmark at https://ml.cs.tsinghua.edu.cn/ares-bench/, which allows users to upload adversarial attack algorithms and defense models for evaluation.
Online Control of the False Discovery Rate under "Decision Deadlines"
Scientific discoveries form an ongoing, ever-evolving process. Each new experiment offers an opportunity to suggest new hypotheses based on results that have come before. Traditionally, the hypotheses researchers plan to test in an experiment are prespecified before any data from the experiment is visible, as this facilitates control of either the false discovery rate (FDR; Benjamini and Hochberg, 1995) or the probability of producing any false positives (the familywise error rate, or FWER; see, for example Efron and Hastie, 2016) within that experiment. In contrast to fully prespecified procedures, online procedures test hypotheses sequentially, and allow the results of preliminary tests to inform choices about which hypotheses to focus on in future tests (Foster and Stine, 2008). These procedures typically require that error rates be controlled at every stage of the sequence (e.g., Javanmard and Montanari, 2015; Ramdas et al., 2017).
NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models
Silverman, Greg M. | Sahoo, Himanshu S. (NLP/IE Program, Department of Electrical and Computer Engineering, University of Minnesota) | Ingraham, Nicholas E. (Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, University of Minnesota) | Lupei, Monica (Division of Critical Care, Department of Anesthesiology, University of Minnesota) | Puskarich, Michael A. (Department of Emergency Medicine, University of Minnesota) | Usher, Michael (Department of Medicine, University of Minnesota) | Dries, James (University of Minnesota) | Finzel, Raymond L. (NLP/IE Program, College of Pharmacy, University of Minnesota) | Murray, Eric (Information Technology, M Health Fairview) | Sartori, John (Department of Electrical and Computer Engineering, University of Minnesota) | Simon, Gyorgy (Institute for Health Informatics, University of Minnesota ) | Zhang, Rui | Melton, Genevieve B. (NLP/IE Program, Department of Surgery, and Institute for Health Informatics, University of Minnesota, Fairview Health Services, Information Technology) | Tignanelli, Christopher J. (NLP/IE Program, Department of Surgery, University of Minnesota ) | Pakhomov, Serguei VS (NLP/IE Program, College of Pharmacy, University of Minnesota )
Statistical modeling of outcomes based on a patient's presenting symptoms (symptomatology) can help deliver high quality care and allocate essential resources, which is especially important during the COVID-19 pandemic. Patient symptoms are typically found in unstructured notes, and thus not readily available for clinical decision making. In an attempt to fill this gap, this study compared two methods for symptom extraction from Emergency Department (ED) admission notes. Both methods utilized a lexicon derived by expanding The Center for Disease Control and Prevention's (CDC) Symptoms of Coronavirus list. The first method utilized a word2vec model to expand the lexicon using a dictionary mapping to the Uni ed Medical Language System (UMLS). The second method utilized the expanded lexicon as a rule-based gazetteer and the UMLS. These methods were evaluated against a manually annotated reference (f1-score of 0.87 for UMLS-based ensemble; and 0.85 for rule-based gazetteer with UMLS). Through analyses of associations of extracted symptoms used as features against various outcomes, salient risks among the population of COVID-19 patients, including increased risk of in-hospital mortality (OR 1.85, p-value < 0.001), were identified for patients presenting with dyspnea. Disparities between English and non-English speaking patients were also identified, the most salient being a concerning finding of opposing risk signals between fatigue and in-hospital mortality (non-English: OR 1.95, p-value = 0.02; English: OR 0.63, p-value = 0.01). While use of symptomatology for modeling of outcomes is not unique, unlike previous studies this study showed that models built using symptoms with the outcome of in-hospital mortality were not significantly different from models using data collected during an in-patient encounter (AUC of 0.9 with 95% CI of [0.88, 0.91] using only vital signs; AUC of 0.87 with 95% CI of [0.85, 0.88] using only symptoms). These findings indicate that prognostic models based on symptomatology could aid in extending COVID-19 patient care through telemedicine, replacing the need for in-person options. The methods presented in this study have potential for use in development of symptomatology-based models for other diseases, including for the study of Post-Acute Sequelae of COVID-19 (PASC).