Goto

Collaborating Authors

 Performance Analysis


MarkovGNN: Graph Neural Networks on Markov Diffusion

arXiv.org Artificial Intelligence

Most real-world networks contain well-defined community structures where nodes are densely connected internally within communities. To learn from these networks, we develop MarkovGNN that captures the formation and evolution of communities directly in different convolutional layers. Unlike most Graph Neural Networks (GNNs) that consider a static graph at every layer, MarkovGNN generates different stochastic matrices using a Markov process and then uses these community-capturing matrices in different layers. MarkovGNN is a general approach that could be used with most existing GNNs. We experimentally show that MarkovGNN outperforms other GNNs for clustering, node classification, and visualization tasks. The source code of MarkovGNN is publicly available at \url{https://github.com/HipGraph/MarkovGNN}.


The impact of feature importance methods on the interpretation of defect classifiers

arXiv.org Artificial Intelligence

Abstract--Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods. We note, however, that a CS method is not always readily available for Defect classifiers are widely used by many large software corporations a given classifier. Defect classifiers are commonly and deep neural networks do not have a widely accepted CS interpreted to uncover insights to improve software quality. Therefore it is the feature importance ranks of different classifiers is pivotal that these generated insights are reliable. Such CA methods measure the contribution of each feature a feature importance method to compute a ranking of feature towards a classifier's predictions. These measure the contribution of each feature by effecting changes to feature importance ranks reflect the order in which the studied that particular feature in the dataset and observing its impact on features contribute to the predictive capability of the studied the outcome. The primary advantage of CA methods is that they classifier [14].


Automatic Identification of Self-Admitted Technical Debt from Different Sources

arXiv.org Artificial Intelligence

Technical debt is a metaphor describing the situation that long-term benefits (e.g., maintainability and evolvability of software) are traded for short-term goals. When technical debt is admitted explicitly by developers in software artifacts (e.g., code comments or issue tracking systems), it is termed as Self-Admitted Technical Debt or SATD. Technical debt could be admitted in different sources, such as source code comments, issue tracking systems, pull requests, and commit messages. However, there is no approach proposed for identifying SATD from different sources. Thus, in this paper, we propose an approach for automatically identifying SATD from different sources (i.e., source code comments, issue trackers, commit messages, and pull requests).


Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

arXiv.org Artificial Intelligence

Technical debt is a metaphor indicating sub-optimal solutions implemented for short-term benefits by sacrificing the long-term maintainability and evolvability of software. A special type of technical debt is explicitly admitted by software engineers (e.g. using a TODO comment); this is called Self-Admitted Technical Debt or SATD. Most work on automatically identifying SATD focuses on source code comments. In addition to source code comments, issue tracking systems have shown to be another rich source of SATD, but there are no approaches specifically for automatically identifying SATD in issues. In this paper, we first create a training dataset by collecting and manually analyzing 4,200 issues (that break down to 23,180 sections of issues) from seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase, Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and Google Monorail). We then propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning. Our findings indicate that: 1) our approach outperforms baseline approaches by a wide margin with regard to the F1-score; 2) transferring knowledge from suitable datasets can improve the predictive performance of our approach; 3) extracted SATD keywords are intuitive and potentially indicating types and indicators of SATD; 4) projects using different issue tracking systems have less common SATD keywords compared to projects using the same issue tracking system; 5) a small amount of training data is needed to achieve good accuracy.


Net benefit, calibration, threshold selection, and training objectives for algorithmic fairness in healthcare

arXiv.org Machine Learning

A growing body of work uses the paradigm of algorithmic fairness to frame the development of techniques to anticipate and proactively mitigate the introduction or exacerbation of health inequities that may follow from the use of model-guided decision-making. We evaluate the interplay between measures of model performance, fairness, and the expected utility of decision-making to offer practical recommendations for the operationalization of algorithmic fairness principles for the development and evaluation of predictive models in healthcare. We conduct an empirical case-study via development of models to estimate the ten-year risk of atherosclerotic cardiovascular disease to inform statin initiation in accordance with clinical practice guidelines. We demonstrate that approaches that incorporate fairness considerations into the model training objective typically do not improve model performance or confer greater net benefit for any of the studied patient populations compared to the use of standard learning paradigms followed by threshold selection concordant with patient preferences, evidence of intervention effectiveness, and model calibration. These results hold when the measured outcomes are not subject to differential measurement error across patient populations and threshold selection is unconstrained, regardless of whether differences in model performance metrics, such as in true and false positive error rates, are present. In closing, we argue for focusing model development efforts on developing calibrated models that predict outcomes well for all patient populations while emphasizing that such efforts are complementary to transparent reporting, participatory design, and reasoning about the impact of model-informed interventions in context.


Here's Why Your Rapid Test Is Negative Even If You Have COVID-19

International Business Times

Rapid COVID-19 tests can generate false-negative results because they aren't that sensitive, according to a medical expert. Rapid COVID-19 tests, or antigen tests, appear positive if they detect a certain amount of coronavirus -- also known as viral load -- from a sample taken from a person's body, according to BuzzFeed News. Dr. Emily Landon, an infectious disease expert, said that the window when viral load is at its peak can vary from person to person and can range from three days to more than a week as people's systems clear the virus at their own pace. Due to this, it may either take some time for an infected person's result to turn positive or never appear positive if they miss this window or collect their test sample incorrectly, among other things, according to Landon, who is also an associate professor of medicine at the University of Chicago Medicine. "Rapid tests are definitely not like a pregnancy test where it's going to be positive as long as it's been a few weeks after someone missed a period. It's only going to pick it up when you're at peak infectiousness, and they're almost never false positive," the doctor explained.


Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

arXiv.org Machine Learning

A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models are more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model with Gaussian data that the membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we study different methods for mitigating such attacks in the overparameterized regime, such as noise addition and regularization, and conclude that simply reducing the parameters of an overparameterized model is an effective strategy to protect it from membership inference without greatly decreasing its generalization error.


VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

arXiv.org Machine Learning

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.


COVID-19 Rapid Test Recall: These Brands Give False Positives

International Business Times

The Food and Drug Administration issued a warning Friday to stop using the COVID-19 rapid antigen test Empowered Diagnostics CovClear and the neutralizing antibody rapid test ImmunoPass. These tests are being distributed in the U.S. with labels that show the FDA authorized them, which they did not. The FDA has concerns about the potential risk of false positives when using COVID-19 tests are not approved. They have classified the two tests as a Class I recall, which is the most serious type of recall. "These tests were distributed with labeling showing the FDA authorized them, but neither test has been authorized, cleared, or approved by the FDA for distribution or use in the United States. The FDA is concerned about the potentially higher risk of false results when using unauthorized tests," read an FDA press release.


Giving an AI control of nuclear weapons: What could possibly go wrong? - Bulletin of the Atomic Scientists

#artificialintelligence

If artificial intelligences controlled nuclear weapons, all of us could be dead. In 1983, Soviet Air Defense Forces Lieutenant Colonel Stanislav Petrov was monitoring nuclear early warning systems, when the computer concluded with the highest confidence that the United States had launched a nuclear war. But Petrov was doubtful: The computer estimated only a handful of nuclear weapons were incoming, when such a surprise attack would more plausibly entail an overwhelming first strike. He also didn't trust the new launch detection system, and the radar system didn't have corroborative evidence. Petrov decided the message was a false positive and did nothing. The computer was wrong; Petrov was right.