AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Ivy: Instrumental Variable Synthesis for Causal Inference

Kuang, Zhaobin, Sala, Frederic, Sohoni, Nimit, Wu, Sen, Córdova-Palomera, Aldo, Dunnmon, Jared, Priest, James, Ré, Christopher

arXiv.org Machine LearningApr-11-2020

A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates---which are not necessarily strong, or even valid, IVs---into a single "summary" that is plugged into causal effect estimators in place of an IV. In genetic epidemiology, such approaches are known as allele scores. Allele scores require strong assumptions---independence and validity of all IV candidates---for the resulting estimate to be reliable. To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner. Theoretically, we characterize this robustness, its limits, and its impact on the resulting causal estimates. Empirically, Ivy can correctly identify the directionality of known relationships and is robust against false discovery (median effect size <= 0.025) on three real-world datasets with no causal effects, while allele scores return more biased estimates (median effect size >= 0.118).

dependency, iv candidate, ivy, (16 more...)

arXiv.org Machine Learning

2004.05316

Country:

North America > Canada > Quebec > Montreal (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Jordan (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (0.93)
Health & Medicine > Epidemiology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

Training Data Set Assessment for Decision-Making in a Multiagent Landmine Detection Platform

Florez-Lozano, Johana, Caraffini, Fabio, Parra, Carlos, Gongora, Mario

arXiv.org Artificial IntelligenceApr-11-2020

Real-world problems such as landmine detection require multiple sources of information to reduce the uncertainty of decision-making. A novel approach to solve these problems includes distributed systems, as presented in this work based on hardware and software multi-agent systems. To achieve a high rate of landmine detection, we evaluate the performance of a trained system over the distribution of samples between training and validation sets. Additionally, a general explanation of the data set is provided, presenting the samples gathered by a cooperative multi-agent system developed for detecting improvised explosive devices. The results show that input samples affect the performance of the output decisions, and a decision-making system can be less sensitive to sensor noise with intelligent systems obtained from a diverse and suitably organised training set.

agent, case 1, fpr fpr fpr, (13 more...)

arXiv.org Artificial Intelligence

2004.0538

Country:

South America > Colombia > Bogotá D.C. > Bogotá (0.04)
Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Colorado (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Government > Military (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Modified Bayesian Optimization based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting

Putatunda, Sayan, Rama, Kiran

arXiv.org Machine LearningApr-10-2020

It is already reported in the literature that the performance of a machine learning algorithm is greatly impacted by performing proper Hyper-Parameter optimization. One of the ways to perform Hyper-Parameter optimization is by manual search but that is time consuming. Some of the common approaches for performing Hyper-Parameter optimization are Grid search Random search and Bayesian optimization using Hyperopt. In this paper, we propose a brand new approach for hyperparameter improvement i.e. Randomized-Hyperopt and then tune the hyperparameters of the XGBoost i.e. the Extreme Gradient Boosting algorithm on ten datasets by applying Random search, Randomized-Hyperopt, Hyperopt and Grid Search. The performances of each of these four techniques were compared by taking both the prediction accuracy and the execution time into consideration. We find that the Randomized-Hyperopt performs better than the other three conventional methods for hyper-paramter optimization of XGBoost.

dataset, hyperparameter, optimization, (13 more...)

arXiv.org Machine Learning

doi: 10.1109/ICInPro47689.2019.9092025

2004.05041

Country:

Asia > India > Karnataka > Bengaluru (0.06)
Europe > France (0.04)
Asia > Taiwan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Add feedback

Visual Spoofing in content based spam detection

Sokolov, Mark, Olufowobi, Kehinde, Herndon, Nic

arXiv.org Machine LearningApr-10-2020

"Subject: Please send money Body: I am so distraught. I thought i could reach out to you to help me out. I came down to United Kingdom for a short vacation unfortunately i was mugged at the park of the hotel i stayed, all cash, credit card and cell phone was stolen from me but luckily for me i still have my passport with me. I've been to the embassy and to the police here but they're not helping issues at all and, my flight leaves in few hours time from now but. I am having problems settling the hotel bills and the hotel manager won't let me leave until i settle my hotel bills. I'm freaked out at the moment." As expected, this email, which definitely seems to be spam, ends up in the junk email folder. However, in this paper we show that visual spoofing achieved by substituting some confusables (characters that look similar) into the above email text will enable the same email to bypass the spam filter. We also propose ways to address this loophole.

algorithm, classifier, email, (16 more...)

arXiv.org Machine Learning

2004.05265

Country:

Europe > United Kingdom (0.24)
Asia > Middle East > Iran (0.15)
North America > United States > North Carolina > Pitt County > Greenville (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

False negative coronavirus tests could be due to how healthcare workers are collecting samples

Daily Mail - Science & techApr-9-2020, 09:35:10 GMT

The US has tested more than 1.2 million Americans for coronavirus, but some have received negative results despite being infected. The coronavirus is a disease that forms in the lungs, but it sometimes sits in a cavity between the nose and throat where a swab is unable to reach. Although the RT-polymerase chain reaction (rRT-PCR) detection is the'gold standard' for testing, it can produce a false negative if the sample is not taken properly. Experts also believe that because hospitals and drive-thru testing sites are being flooded by people, healthcare workers are also rushing to tend to as many individuals as possible and are not grabbing the samples properly. The coronavirus is a disease that forms in the lungs, but it sometimes sits in a cavity between the nose and throat where a swab is unable to reach.

false negative, false negative coronavirus test, healthcare worker, (11 more...)

Daily Mail - Science & tech

Country:

North America > United States > California > Yolo County > Davis (0.05)
Asia > China (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Latent regularization for feature selection using kernel methods in tumor classification

Palazzo, Martin, Yankilevich, Patricio, Beauseroy, Pierre

arXiv.org Machine LearningApr-9-2020

The transcriptomics of cancer tumors are characterized with tens of thousands of gene expression features. Patient prognosis or tumor stage can be assessed by machine learning techniques like supervised classification tasks given a gene expression profile. Feature selection is a useful approach to select the key genes which helps to classify tumors. In this work we propose a feature selection method based on Multiple Kernel Learning that results in a reduced subset of genes and a custom kernel that improves the classification performance when used in support vector classification. During the feature selection process this method performs a novel latent regularisation by relaxing the supervised target problem by introducing unsupervised structure obtained from the latent space learned by a non linear dimensionality reduction model. An improvement of the generalization capacity is obtained and assessed by the tumor classification performance on new unseen test samples when the classifier is trained with the features selected by the proposed method in comparison with other supervised feature selection approaches.

feature selection, kernel, selection, (16 more...)

arXiv.org Machine Learning

2004.04866

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Multiclass Classification via Class-Weighted Nearest Neighbors

Khim, Justin, Xu, Ziyu, Singh, Shashank

arXiv.org Machine LearningApr-9-2020

Classification is a fundamental problem in statistics and machine learning that arises in many scientific and engineering problems. Scientific applications include identifying plant and animal species from body measurements, determining cancer types based on gene expression, and satellite image processing (Fisher, 1936, 1938; Khan et al., 2001; Lee et al., 2004); in modern engineering contexts, credit card fraud detection, handwritten digit recognition, word sense disambiguation, and object detection in images are all examples of classification tasks. These applications have brought two new challenges: multiclass classification with a potentially large number of classes and imbalanced data. For example, in online retailing, websites have hundreds of thousands or millions of products, and they may like to categorize these products within a preexisting taxonomy based on product descriptions (Lin et al., 2018). While the number of classes alone makes the problem difficult, an added difficulty with text data is that it is usually highly imbalanced, meaning that a few classes may constitute a large fraction of the data while many classes have only a few examples. In fact, Feldman (2019) notes that if the data follows the classical Zipf distribution for text data (Zipf, 1936), i.e., the class probabilities satisfy a power-law distribution, then up to 35% of seen examples may appear only once in the training data. Additionally, natural image data also seems to have the problems of many classes and imbalanced data (Salakhutdinov et al., 2011; Zhu et al., 2014). Focusing on the problem of imbalanced data, researchers have found that a few heuristics help "do better," and the most principled and studied of these is weighting. There are a number of forms of weighting; we consider the most basic in which we incur a loss of weight for misclassifying an example of class and refer to this method as class-weighting.

classification, prec, probability, (15 more...)

arXiv.org Machine Learning

2004.04715

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Law Enforcement & Public Safety > Fraud (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Diagnosing COVID-19 from X-Ray and Images using Deep Learning Algorithms Learn Neural Networks

#artificialintelligenceApr-8-2020, 12:20:10 GMT

Throughout history, epidemics and chronic diseases have claimed the lives of many people and caused major crises that have taken a long time to overcome. The 2019 novel coronavirus (COVID-19) pandemic appeared in Wuhan, China in December 2019 and has become a serious public health problem worldwide. It is an acute resolved disease, but it can also be deadly, with a 2% case fatality rate. The early and automatic diagnosis of Covid-19 may be beneficial for timely referral of the patient to quarantine, and monitoring of the spread of the disease. Some tests requiring significant time to produce results (days), and a projected up to 30% false positive rate, other timely approaches to diagnosis are worthy of investigation.

covid-19, covid-19 case, neural network, (13 more...)

#artificialintelligence

Country: Asia > China > Hubei Province > Wuhan (0.25)

Genre: Research Report (0.36)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Interpolants as Explanations for Neural Inferences

McMillan, Kenneth L.

arXiv.org Machine LearningApr-8-2020

The notion of Craig interpolant, used as a form of explanation in automated reasoning, is adapted from logical inference to statistical inference and used to explain inferences made by neural networks. The method produces explanations that are at the same time concise, understandable and precise.

explanation, interpolant, precision, (15 more...)

arXiv.org Machine Learning

2004.04198

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Veneto > Venice (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Model-Agnostic Characterization of Fairness Trade-offs

Kim, Joon Sik, Chen, Jiahao, Talwalkar, Ameet

arXiv.org Machine LearningApr-8-2020

There exist several inherent trade-offs in designing a fair model, such as those between the model's predictive performance and fairness, or even among different notions of fairness. In practice, exploring these trade-offs requires significant human and computational resources. We propose a diagnostic that enables practitioners to explore these trade-offs without training a single model. Our work hinges on the observation that many widely-used fairness definitions can be expressed via the fairness-confusion tensor, an object obtained by splitting the traditional confusion matrix according to protected data attributes. Optimizing accuracy and fairness objectives directly over the elements in this tensor yields a data-dependent yet model-agnostic way of understanding several types of trade-offs. We further leverage this tensor-based perspective to generalize existing theoretical impossibility results to a wider range of fairness definitions. Finally, we demonstrate the usefulness of the proposed diagnostic on synthetic and real datasets.

accuracy, fairness, fairness constraint, (15 more...)

arXiv.org Machine Learning

2004.03424

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science (0.93)

Add feedback