Accuracy
How AI could help screen for autism in children
For children with autism spectrum disorder (ASD), receiving an early diagnosis can make a huge difference in improving behavior, skills and language development. There is no lab test and no single identified genetic cause--instead, clinicians look at the child's behavior and conduct structured interviews with the child's caregivers based on questionnaires. But these questionnaires are extensive, complicated and not foolproof. "In trying to discern and stratify a complex condition such as autism spectrum disorder, knowing what questions to ask and in what order becomes challenging," said USC University Professor Shrikanth Narayanan, Niki and Max Nikias Chair in Engineering and professor of electrical and computer engineering, computer science, linguistics, psychology, pediatrics and otolaryngology. "As such, this system is difficult to administer and can produce false positives, or confound ASD as other comorbid conditions, such as attention deficit hyperactivity disorder (ADHD)."
A survey on multi-objective hyperparameter optimization algorithms for Machine Learning
Morales-Hernรกndez, Alejandro, Van Nieuwenhuyse, Inneke, Gonzalez, Sebastian Rojas
Hyperparameter optimization (HPO) is a necessary step to ensure the best possible performance of Machine Learning (ML) algorithms. Several methods have been developed to perform HPO; most of these are focused on optimizing one performance measure (usually an error-based measure), and the literature on such single-objective HPO problems is vast. Recently, though, algorithms have appeared which focus on optimizing multiple conflicting objectives simultaneously. This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms, distinguishing between metaheuristic-based algorithms, metamodel-based algorithms, and approaches using a mixture of both. We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
A Comparative Analysis of Machine Learning Techniques for IoT Intrusion Detection
Vitorino, Joรฃo, Andrade, Rui, Praรงa, Isabel, Sousa, Orlando, Maia, Eva
The digital transformation faces tremendous security challenges. In particular, the growing number of cyber-attacks targeting Internet of Things (IoT) systems restates the need for a reliable detection of malicious network activity. This paper presents a comparative analysis of supervised, unsupervised and reinforcement learning techniques on nine malware captures of the IoT-23 dataset, considering both binary and multi-class classification scenarios. The developed models consisted of Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Isolation Forest (iForest), Local Outlier Factor (LOF) and a Deep Reinforcement Learning (DRL) model based on a Double Deep Q-Network (DDQN), adapted to the intrusion detection context. The most reliable performance was achieved by LightGBM. Nonetheless, iForest displayed good anomaly detection results and the DRL model demonstrated the possible benefits of employing this methodology to continuously improve the detection. Overall, the obtained results indicate that the analyzed techniques are well suited for IoT intrusion detection.
Differentially Private Ensemble Classifiers for Data Streams
Gondara, Lovedeep, Wang, Ke, Carvalho, Ricardo Silva
To further add to the challenge, data streams from many domains involve sensitive, personal information about contributing Learning from continuous data streams via classification/regression users, such as patients' records and user data in mobile applications, is prevalent in many domains. Adapting to evolving data characteristics protection of which is of paramount interest. While concept (concept drift) while protecting data owners' private information drift and privacy have been extensively studied in isolation, works is an open challenge. We present a differentially private considering both are in infancy. See more discussion in Section ensemble solution to this problem with two distinguishing features: 2. In this work, our goal is to allow machine learning models to it allows an unbounded number of ensemble updates to deal with deal with concept drift when training on potentially never-ending the potentially never-ending data streams under a fixed privacy data streams involving sensitive data, where the model(s) learned budget, and it is model agnostic, in that it treats any pre-trained can be published without disclosing sensitive information.
Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings
Myklebust, Erik B., Jimรฉnez-Ruiz, Ernesto, Chen, Jiaoyan, Wolf, Raoul, Tollefsen, Knut Erik
We have created a knowledge graph based on major data sources used in ecotoxicological risk assessment. We have applied this knowledge graph to an important task in risk assessment, namely chemical effect prediction. We have evaluated nine knowledge graph embedding models from a selection of geometric, decomposition, and convolutional models on this prediction task. We show that using knowledge graph embeddings can increase the accuracy of effect prediction with neural networks. Furthermore, we have implemented a fine-tuning architecture which adapts the knowledge graph embeddings to the effect prediction task and leads to a better performance. Finally, we evaluate certain characteristics of the knowledge graph embedding models to shed light on the individual model performance.
Naive Bayes Classifier Spam Filter Example : 4 Easy Steps
In probability, Bayes is a type of conditional probability. It predicts the event based on an event that has already happened. You can use Naive Bayes as a supervised machine learning method for predicting the event based on the evidence present in your dataset. In this tutorial, you will learn how to classify the email as spam or not using the Naive Bayes Classifier. Before doing coding demonstration, Let's know about the Naive Bayes in a brief.
Every Single Way You Can Tell Trump World Is Lying About Its Latest COVID Scandal
Donald Trump and his former White House chief of staff Mark Meadows are peddling a new story about the ex-president's coronavirus infection. Their first story was that Trump didn't test positive until Oct. 1, 2020, two days after he debated Joe Biden. Then Meadows admitted in his new book, The Chief's Chief, that Trump actually tested positive on Sept. 26, three days before the debate. That admission was problematic, since Trump never informed Biden--or hundreds of other unwitting people who interacted closely with the maskless president in the intervening five days--about the test result. So now Trump and Meadows have concocted yet another story: The Sept. 26 result was a "false positive."
Assessing Fairness in the Presence of Missing Data
Missing data are prevalent and present daunting challenges in real data analysis. While there is a growing body of literature on fairness in analysis of fully observed data, there has been little theoretical work on investigating fairness in analysis of incomplete data. In practice, a popular analytical approach for dealing with missing data is to use only the set of complete cases, i.e., observations with all features fully observed to train a prediction algorithm. However, depending on the missing data mechanism, the distribution of complete cases and the distribution of the complete data may be substantially different. When the goal is to develop a fair algorithm in the complete data domain where there are no missing values, an algorithm that is fair in the complete case domain may show disproportionate bias towards some marginalized groups in the complete data domain. To fill this significant gap, we study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases. We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text
Shaaban, Mai A., Hassan, Yasser F., Guirguis, Shawkat K.
The increase in people's use of mobile messaging services has led to the spread of social engineering attacks like phishing, considering that spam text is one of the main factors in the dissemination of phishing attacks to steal sensitive data such as credit cards and passwords. In addition, rumors and incorrect medical information regarding the COVID-19 pandemic are widely shared on social media leading to people's fear and confusion. Thus, filtering spam content is vital to reduce risks and threats. Previous studies relied on machine learning and deep learning approaches for spam classification, but these approaches have two limitations. Machine learning models require manual feature engineering, whereas deep neural networks require a high computational cost. This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. The proposed model utilizes convolutional and pooling layers for feature extraction along with base classifiers such as random forests and extremely randomized trees for classifying texts into spam or legitimate ones. Moreover, the model employs ensemble learning procedures like boosting and bagging. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
Incentive Compatible Pareto Alignment for Multi-Source Large Graphs
Liang, Jian, Lv, Fangrui, Liu, Di, Dai, Zehui, Tian, Xu, Li, Shuang, Wang, Fei, Li, Han
In this paper, we focus on learning effective entity matching models over multi-source large-scale data. For real applications, we relax typical assumptions that data distributions/spaces, or entity identities are shared between sources, and propose a Relaxed Multi-source Large-scale Entity-matching (RMLE) problem. Challenges of the problem include 1) how to align large-scale entities between sources to share information and 2) how to mitigate negative transfer from joint learning multi-source data. What's worse, one practical issue is the entanglement between both challenges. Specifically, incorrect alignments may increase negative transfer; while mitigating negative transfer for one source may result in poorly learned representations for other sources and then decrease alignment accuracy. To handle the entangled challenges, we point out that the key is to optimize information sharing first based on Pareto front optimization, by showing that information sharing significantly influences the Pareto front which depicts lower bounds of negative transfer. Consequently, we proposed an Incentive Compatible Pareto Alignment (ICPA) method to first optimize cross-source alignments based on Pareto front optimization, then mitigate negative transfer constrained on the optimized alignments. This mechanism renders each source can learn based on its true preference without worrying about deteriorating representations of other sources. Specifically, the Pareto front optimization encourages minimizing lower bounds of negative transfer, which optimizes whether and which to align. Comprehensive empirical evaluation results on four large-scale datasets are provided to demonstrate the effectiveness and superiority of ICPA. Online A/B test results at a search advertising platform also demonstrate the effectiveness of ICPA in production environments.