AITopics

The evolution of drug-resistant microbial species is one of the major challenges to global health. The development of new antimicrobial treatments such as antimicrobial peptides needs to be accelerated to combat this threat. However, the discovery of novel antimicrobial peptides is hampered by low-throughput biochemical assays. Computational techniques can be used for rapid screening of promising antimicrobial peptide candidates prior to testing in the wet lab. The vast majority of existing antimicrobial peptide predictors are non-targeted in nature, i.e., they can predict whether a given peptide sequence is antimicrobial, but they are unable to predict whether the sequence can target a particular microbial species. In this work, we have developed a targeted antimicrobial peptide activity predictor that can predict whether a peptide is effective against a given microbial species or not. This has been made possible through zero-shot and few-shot machine learning. The proposed predictor called AMP0 takes in the peptide amino acid sequence and any N/C-termini modifications together with the genomic sequence of a target microbial species to generate targeted predictions. It is important to note that the proposed method can generate predictions for species that are not part of its training set. The accuracy of predictions for novel test species can be further improved by providing a few example peptides for that species. Our computational cross-validation results show that the pro-posed scheme is particularly effective for targeted antimicrobial prediction in comparison to existing approaches and can be used for screening potential antimicrobial peptides in a targeted manner especially for cases in which the number of training examples is small. The webserver of the method is available at http://ampzero.pythonanywhere.com.

microbial species, peptide, sequence, (15 more...)

1911.06106

Country:

Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Mediratta, Deepshi, Oswal, Nikhil

Detect Toxic Content to Improve Online Conversations

arXiv.org Artificial IntelligenceOct-28-2019

Social media is filled with toxic content. The aim of this paper is to build a model that can detect insincere questions. We use the 'Quora Insincere Questions Classification' dataset for our analysis. The dataset is composed of sincere and insincere questions, with the majority of sincere questions. The dataset is processed and analyzed using Python and its libraries such as sklearn, numpy, pandas, keras etc. The dataset is converted to vector form using word embeddings such as GloVe, Wiki-news and TF-IDF. The imbalance in the dataset is handled by resampling techniques. We train and compare various machine learning and deep learning models to come up with the best results. Models discussed include SVM, Naive Bayes, GRU and LSTM.

dataset, insincere question, undersampling, (14 more...)

arXiv.org Artificial Intelligence

1911.01217

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Greece > West Greece > Patra (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Orooji, Marmar, Chen, Jianhua

Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques

-- This study is motivated by the magnitude of the problem of Louisiana high school dropout and its negative impacts on individual and public wellbeing. Our goal is to predict students who are at risk of high school dropout, by examining Louisiana administrative dataset. Due to the imbalanced nature of the dataset, imbalanced learning techniques including resampling, case weighting, and cost-sensitive learning have been applied to enhance the prediction performance on the rare class. Performance metrics used in this study are F-measure, recall and precision of the rare class. We compare the performance of several machine learning algorithms such as neural networks, decision trees and bagging trees in combination with the imbalanced learning approaches using an administrative dataset of size of 366k from Louisiana Department of Education. Experiments show that application of imbalanced learning methods produces good results on recall but decreases precision, whereas base classifiers without regard of imbalanced data handling gives better precision but poor recall. Overall application of imbalanced learning techniques is beneficial, yet more studies are desired to improve precision. Louisiana has maintained one of the highest school dropout rates in the US for many years. The Public Affairs Research Council of Louisiana (PAR, October 2011) estimates that one in six of every public high school students in the state drops out of school.

classifier, dataset, student, (16 more...)

1910.13018

Country: North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.14)

Genre: Research Report > New Finding (0.52)

Industry: Education > Educational Setting > K-12 Education > Secondary School (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lai, Yuanhao, McLeod, Ian

Ensemble Quantile Classifier

Both the median-based classifier and the quantile-based classifier are useful for discriminating high-dimensional data with heavy-tailed or skewed inputs. But these methods are restricted as they assign equal weight to each variable in an unregularized way. The ensemble quantile classifier is a more flexible regularized classifier that provides better performance with high-dimensional data, asymmetric data or when there are many irrelevant extraneous inputs. The improved performance is demonstrated by a simulation study as well as an application to text categorization. It is proven that the estimated parameters of the ensemble quantile classifier consistently estimate the minimal population loss under suitable general model assumptions. It is also shown that the ensemble quantile classifier is Bayes optimal under suitable assumptions with asymmetric Laplace distribution inputs.

classifier, equation, error rate, (15 more...)

doi: 10.1016/j.csda.2019.106849

1910.1296

Country:

North America > United States > New York (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

He, Yuzi, Burghardt, Keith, Lerman, Kristina

Learning Fair and Interpretable Representations via Linear Orthogonalization

To reduce human error and prejudice, many high-stakes decisions have been turned over to machine algorithms. However, recent research suggests that this does not remove discrimination, and can perpetuate harmful stereotypes. While algorithms have been developed to improve fairness, they typically face at least one of three shortcomings: they are not interpretable, they lose significant accuracy compared to unbiased equivalents, or they are not transferable across models. To address these issues, we propose a geometric method that removes correlations between data and any number of protected variables. Further, we can control the strength of debi-asing through an adjustable parameter to address the tradeoff between model accuracy and fairness. The resulting features are interpretable and can be used with many popular models, such as linear regression, random forest and multilayer perceptrons. The resulting predictions are found to be more accurate and fair than several comparable fair AI algorithms across a variety of benchmark datasets. Our work shows that debiasing data is a simple and effective solution toward improving fairness.

accuracy, fairness, representation, (15 more...)

1910.12854

Country:

North America > United States > California (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

An Ensemble Approach toward Automated Variable Selection for Network Anomaly Detection

Nakashima, Makiya, Sim, Alex, Kim, Youngsoo, Kim, Jonghyun, Kim, Jinoh

While variable selection is essential to optimize the learning complexity by prioritizing features, automating the selection process is preferred since it requires laborious efforts with intensive analysis otherwise. However, it is not an easy task to enable the automation due to several reasons. First, selection techniques often need a condition to terminate the reduction process, for example, by using a threshold or the number of features to stop, and searching an adequate stopping condition is highly challenging. Second, it is uncertain that the reduced variable set would work well; our preliminary experimental result shows that well-known selection techniques produce different sets of variables as a result of reduction (even with the same termination condition), and it is hard to estimate which of them would work the best in future testing. In this paper, we demonstrate the potential power of our approach to the automation of selection process that incorporates well-known selection methods identifying important variables. Our experimental results with two public network traffic data (UNSW-NB15 and IDS2017) show that our proposed method identifies a small number of core variables, with which it is possible to approximate the performance to the one with the entire variables.

classifier, selection, selection method, (17 more...)

1910.12806

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Lim, Jen Ning, Yamada, Makoto, Schölkopf, Bernhard, Jitkrittum, Wittawat

Kernel Stein Tests for Multiple Model Comparison

arXiv.org Machine LearningOct-27-2019

We address the problem of non-parametric multiple model comparison: given $l$ candidate models, decide whether each candidate is as good as the best one(s) or worse than it. We propose two statistical tests, each controlling a different notion of decision errors. The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declared worse (false positive rate). The second test is based on multiple correction, and controls the proportion of the models declared worse but are in fact as good as the best (false discovery rate). We prove that under appropriate conditions the first test can yield a higher true positive rate than the second. Experimental results on toy and real (CelebA, Chicago Crime data) problems show that the two tests have high true positive rates with well-controlled error rates. By contrast, the naive approach of choosing the model with the lowest score without correction leads to more false positives.

candidate model, experiment, relmulti, (17 more...)

1910.12252

Country:

North America > United States > Illinois > Cook County > Chicago (0.24)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Lu, Jialin, Ester, Martin

An Active Approach for Model Interpretation

arXiv.org Artificial IntelligenceOct-27-2019

Model interpretation, or explanation of a machine learning classifier, aims to extract generalizable knowledge from a trained classifier into a human-understandable format, for various purposes such as model assessment, debugging and trust. From a computaional viewpoint, it is formulated as approximating the target classifier using a simpler interpretable model, such as rule models like a decision set/list/tree. Often, this approximation is handled as standard supervised learning and the only difference is that the labels are provided by the target classifier instead of ground truth. This paradigm is particularly popular because there exists a variety of well-studied supervised algorithms for learning an interpretable classifier. However, we argue that this paradigm is suboptimal for it does not utilize the unique property of the model interpretation problem, that is, the ability to generate synthetic instances and query the target classifier for their labels. We call this the active-query property, suggesting that we should consider model interpretation from an active learning perspective. Following this insight, we argue that the active-query property should be employed when designing a model interpretation algorithm, and that the generation of synthetic instances should be integrated seamlessly with the algorithm that learns the model interpretation. In this paper, we demonstrate that by doing so, it is possible to achieve more faithful interpretation with simpler model complexity. As a technical contribution, we present an active algorithm Active Decision Set Induction (ADS) to learn a decision set, a set of if-else rules, for model interpretation. ADS performs a local search over the space of all decision sets. In every iteration, ADS computes confidence intervals for the value of the objective function of all local actions and utilizes active-query to determine the best one.

algorithm, interpretation, model interpretation, (14 more...)

arXiv.org Artificial Intelligence

1910.12207

Country:

North America > United States > Texas (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

#artificialintelligenceOct-26-2019, 18:28:29 GMT

Making Fairness an Intrinsic Part of Machine Learning Open Data Science Conference

Editor's Note: At ODSC Europe 2019, Sray Agarwal will conduct a workshop on fairness and accountability demonstrating how to detect bias and remove bias from ML models. The suitability of Machine Learning models is traditionally measured on its accuracy. A highly accurate model based on metrics like RMSE, MAPE, AUC, ROC, Gini, etc are considered to be high performing models. While such accuracy metrics important, are there other metrics that the data science community has been ignoring so far? The answer is yes--in the pursuit of accuracy, most models sacrifice "fairness" and "interpretability."

fairness, fairness metric, threshold, (12 more...)

#artificialintelligence

Country: Europe (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

#artificialintelligenceOct-26-2019, 06:16:02 GMT

Convolutional Neural Network for Breast Cancer Classification - KDnuggets

Breast cancer is the second most common cancer in women and men worldwide. In 2012, it represented about 12 percent of all new cancer cases and 25 percent of all cancers in women. Breast cancer starts when cells in the breast begin to grow out of control. These cells usually form a tumor that can often be seen on an x-ray or felt as a lump. The tumor is malignant (cancer) if the cells can grow into (invade) surrounding tissues or spread (metastasize) to distant areas of the body.

batch size, classification, convolutional neural network, (12 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)