AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Missing Data Imputation for Supervised Learning

arXiv.org Machine LearningAug-6-2018

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation.

artificial intelligence, data quality, machine learning, (17 more...)

arXiv.org Machine Learning

1610.09075

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Africa > South Africa (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.47)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility Toxicological Sciences Oxford Academic

#artificialintelligenceAug-5-2018, 23:47:44 GMT

Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals. We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation, mutagenicity and skin sensitization. Based on 350–700 chemicals each, the probability that an OECD guideline animal test would output the same result in a repeat test was 78%–96% (sensitivity 50%–87%). An expanded database with more than 866 000 chemical properties/hazards was used as training data and to model health hazards and chemical properties. The constructed models automate and extend the read-across method of chemical classification. The novel models called RASARs (read-across structure activity relationship) use binary fingerprints and Jaccard distance to define chemical similarity. A large chemical similarity adjacency matrix is constructed from this similarity metric and is used ...

artificial intelligence, data mining, machine learning, (20 more...)

#artificialintelligence

Country:

Europe (0.28)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Multi-Objective Cognitive Model: a supervised approach for multi-subject fMRI analysis

Yousefnezhad, Muhammad, Zhang, Daoqiang

arXiv.org Machine LearningAug-5-2018

Neuroinform manuscript No. (will be inserted by the editor) Abstract In order to decode human brain, Multivariate Pattern (MVP) classification generates cognitive models by using functional Magnetic Resonance Imaging (fMRI) datasets. As a standard pipeline in the MVP analysis, brain patterns in multi-subject fMRI dataset must be mapped to a shared space and then a classification model is generated by employing the mapped patterns. However, the MVP models may not provide stable performance on a new fMRI dataset because the standard pipeline uses disjoint steps for generating these models. Indeed, each step in the pipeline includes an objective function with independent optimization approach, where the best solution of each step may not be optimum for the next steps. For tackling the mentioned issue, this paper introduces Multi-Objective Cognitive Model (MOCM) that utilizes an integrated objective function for MVP analysis rather than just using those disjoint steps. For solving the integrated problem, we proposed a customized multi-objective optimization approach, where all possible solutions are firstly generated, and then our method ranks and selects the robust solutions as the final results. Empirical studies confirm that the proposed method can generate superior performance in comparison with other techniques. Keywords Multi-Objective Cognitive Model · fMRI Analysis · Multivariate Pattern · Multi-Objective Optimization 1 Introduction One of the primary goals in neuroscience is to understand how the neural activities in the human brain can be mapped to different cognitive tasks. The authors are with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China. Magnetic Resonance Imaging (fMRI) data is an interdisciplinary technique.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1808.01642

Country:

Asia > China > Jiangsu Province > Nanjing (0.44)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Workflow (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Active Learning for Wireless IoT Intrusion Detection

Yang, Kai, Ren, Jie, Zhu, Yanqiao, Zhang, Weiyi

arXiv.org Artificial IntelligenceAug-3-2018

Internet of Things (IoT) is becoming truly ubiquitous in our everyday life, but it also faces unique security challenges. Intrusion detection is critical for the security and safety of a wireless IoT network. This paper discusses the human-in-the-loop active learning approach for wireless intrusion detection. We first present the fundamental challenges against the design of a successful Intrusion Detection System (IDS) for wireless IoT network. We then briefly review the rudimentary concepts of active learning and propose its employment in the diverse applications of wireless intrusion detection. Experimental example is also presented to show the significant performance improvement of the active learning method over traditional supervised learning approach. While machine learning techniques have been widely employed for intrusion detection, the application of human-in-the-loop machine learning that leverages both machine and human intelligence to intrusion detection of IoT is still in its infancy. We hope this article can assist the readers in understanding the key concepts of active learning and spur further research in this area.

artificial intelligence, intrusion detection, machine learning, (9 more...)

arXiv.org Artificial Intelligence

1808.01412

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.84)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Wang, Fan, Mukherjee, Sach, Richardson, Sylvia, Hill, Steven M.

arXiv.org Machine LearningAug-2-2018

Penalized likelihood methods are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different methods in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users of these methods. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 1,800 data-generating scenarios, allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely-used methods (Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector as well as Stability Selection). We find considerable variation in performance between methods, with results dependent on details of the data-generating scenario and the specific goal. Our results support a `no panacea' view, with no unambiguous winner across all scenarios, even in this restricted setting where all data align well with the assumptions underlying the methods. Lasso is well-behaved, performing competitively in many scenarios, while SCAD is highly variable. Substantial benefits from a Ridge-penalty are only seen in the most challenging scenarios with strong multi-collinearity. The results are supported by semi-synthetic analyzes using gene expression data from cancer samples. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

artificial intelligence, machine learning, scenario, (16 more...)

arXiv.org Machine Learning

1808.00723

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

The impact of imbalanced training data on machine learning for author name disambiguation

Kim, Jinseok, Kim, Jenna

arXiv.org Machine LearningAug-2-2018

In supervised machine learning for author name disambiguation, negative training data are often dominantly larger than positive training data. This paper examines how the ratios of negative to positive training data can affect the performance of machine learning algorithms to disambiguate author names in bibliographic records. On multiple labeled datasets, three classifiers - Logistic Regression, Na\"ive Bayes, and Random Forest - are trained through representative features such as coauthor names, and title words extracted from the same training data but with various positive-negative training data ratios. Results show that increasing negative training data can improve disambiguation performance but with a few percent of performance gains and sometimes degrade it. Logistic Regression and Na\"ive Bayes learn optimal disambiguation models even with a base ratio (1:1) of positive and negative training data. Also, the performance improvement by Random Forest tends to quickly saturate roughly after 1:10 ~ 1:15. These findings imply that contrary to the common practice using all training data, name disambiguation algorithms can be trained using part of negative training data without degrading much disambiguation performance while increasing computational efficiency. This study calls for more attention from author name disambiguation scholars to methods for machine learning from imbalanced data.

artificial intelligence, machine learning, training data, (13 more...)

arXiv.org Machine Learning

doi: 10.1007/s11192-018-2865-9

1808.00525

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > Onondaga County > Syracuse (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Supervised classification for object identification in urban areas using satellite imagery

Ali, Hazrat, Awan, Adnan Ali, Khan, Sanaullah, Shafique, Omer, Rahman, Atiq ur, Khan, Shahid

arXiv.org Machine LearningAug-2-2018

This paper presents a useful method to achieve classification in satellite imagery. The approach is based on pixel level study employing various features such as correlation, homogeneity, energy and contrast. In this study gray-scale images are used for training the classification model. For supervised classification, two classification techniques are employed namely the Support Vector Machine (SVM) and the Naive Bayes. With textural features used for gray-scale images, Naive Bayes performs better with an overall accuracy of 76% compared to 68% achieved by SVM. The computational time is evaluated while performing the experiment with two different window sizes i.e., 50x50 and 70x70. The required computational time on a single image is found to be 27 seconds for a window size of 70x70 and 45 seconds for a window size of 50x50.

artificial intelligence, machine learning, window size, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/ICOMET.2018.8346383

1808.00878

Country:

Europe > France (0.05)
Asia > Pakistan (0.05)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.05)

Genre: Research Report > New Finding (0.35)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.74)

Add feedback

Make "Fairness by Design" Part of Machine Learning

#artificialintelligenceAug-1-2018, 20:00:56 GMT

Machine learning is increasingly being used to predict individuals' attitudes, behaviors, and preferences across an array of applications -- from personalized marketing to precision medicine. Unsurprisingly, given the speed of change and ever-increasing complexity, there have been several recent high-profile examples of "machine learning gone wrong." A chatbot trained using Twitter was shut down after only a single day because of its obscene and inflammatory tweets. Machine learning models used in a popular search engine struggle to differentiate human images from those of gorillas, and show female searchers ads for lower paying jobs relative to male users. More recently, a study compared the commonly used crime risk analysis tool COMPAS against recidivism predictions from 400 untrained workers recruited via Amazon Mechanical Turk.

artificial intelligence, fairness measure, machine learning, (14 more...)

#artificialintelligence

Country: North America > United States (0.05)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Anomaly Detection via Minimum Likelihood Generative Adversarial Networks

Wang, Chu, Zhang, Yan-Ming, Liu, Cheng-Lin

arXiv.org Machine LearningAug-1-2018

Anomaly detection aims to detect abnormal events by a model of normality. It plays an important role in many domains such as network intrusion detection, criminal activity identity and so on. With the rapidly growing size of accessible training data and high computation capacities, deep learning based anomaly detection has become more and more popular. In this paper, a new domain-based anomaly detection method based on generative adversarial networks (GAN) is proposed. Minimum likelihood regularization is proposed to make the generator produce more anomalies and prevent it from converging to normal data distribution. Proper ensemble of anomaly scores is shown to improve the stability of discriminator effectively. The proposed method has achieved significant improvement than other anomaly detection methods on Cifar10 and UCI datasets.

data mining, detection, machine learning, (19 more...)

arXiv.org Machine Learning

1808.002

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback

Open Category Detection with PAC Guarantees

Liu, Si, Garrepalli, Risheek, Dietterich, Thomas G., Fern, Alan, Hendrycks, Dan

arXiv.org Machine LearningAug-1-2018

Open category detection is the problem of detecting "alien" test instances that belong to categories or classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring the safety and accuracy of test set predictions. Unfortunately, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simple, but practically-relevant variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest and an unlabeled "contaminated" training set that contains a fraction $\alpha$ of alien examples. Under the assumption that we know an upper bound on $\alpha$, we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.

artificial intelligence, machine learning, open category detection, (7 more...)

arXiv.org Machine Learning

1808.00529

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)

Add feedback