Performance Analysis
Online Learning Probabilistic Event Calculus Theories in Answer Set Programming
Katzouris, Nikos, Artikis, Alexander, Paliouras, Georgios
Complex Event Recognition (CER) systems detect event occurrences in streaming time-stamped input using predefined event patterns. Logic-based approaches are of special interest in CER, since, via Statistical Relational AI, they combine uncertainty-resilient reasoning with time and change, with machine learning, thus alleviating the cost of manual event pattern authoring. We present a system based on Answer Set Programming (ASP), capable of probabilistic reasoning with complex event patterns in the form of weighted rules in the Event Calculus, whose structure and weights are learnt online. We compare our ASP-based implementation with a Markov Logic-based one and with a number of state-of-the-art batch learning algorithms on CER datasets for activity recognition, maritime surveillance and fleet management. Our results demonstrate the superiority of our novel approach, both in terms of efficiency and predictive performance. This paper is under consideration for publication in Theory and Practice of Logic Programming (TPLP).
DropBlock: A New Regularization Technique
Regularization is a strategy implemented in a deep neural network that will reduce the generalization error but not the training error to perform well on not just the training data but also on new unseen inputs. An effective regularizer reduces the variance significantly while not overly increasing the bias, thus preventing overfitting. We use regularization techniques like L1 and L2 to reduce overfitting, penalizing the loss function, or regularization techniques like Dropouts and Spatial Dropouts, which discourage model complexity. The principle behind regularization methods in a neural network is to inject noise into neural networks to avoid overfitting the training data. L2 regularization is commonly known as weight decay or ridge regression, or Tikhonov regularization.
Evan Fournier's debut was delayed by a false positive COVID test
It was nothing more than what Brad Stevens termed "a curveball," as it turned out. After an initial false positive COVID test, Evan Fournier turned in a string of negative tests, leading to his first-time availability for the Celtics Monday night against New Orleans. "He will play significant minutes, as he will all the rest of the year," Stevens said of how he planned to begin with the talented wing player, acquired from Orlando at the trade deadline for the since-waived Jeff Teague and two second-round draft picks. "We had an obvious need for another wing that can do what he does, and we're fortunate he's with us, and he's on our team," said the Celtics coach. "So I got a chance to go over to the gym (Sunday) while he was shooting around when we got back and then this morning we went through some stuff prior to our shootaround, we shot around as a team for 30 minutes, so he's gotten the crash course in a very short amount of time. He's been there, done that. He's played against us, you know, tons of times, probably knows our plays as well as anybody, and certainly we just want him to play to his strengths and not worry about anything else."
Deep Learning in current Neuroimaging: a multivariate approach with power and type I error control but arguable generalization ability
Jimรฉnez-Mesa, Carmen, Ramรญrez, Javier, Suckling, John, Vรถglein, Jonathan, Levin, Johannes, Gรณrriz, Juan Manuel, ADNI, Alzheimer's Disease Neuroimaging Initiative, DIAN, Dominantly Inherited Alzheimer Network
Discriminative analysis in neuroimaging by means of deep/machine learning techniques is usually tested with validation techniques, whereas the associated statistical significance remains largely under-developed due to their computational complexity. In this work, a non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures. In particular, a combination of autoencoders (AE) and support vector machines (SVM) is applied to: (i) a one-condition, within-group designs often of normal controls (NC) and; (ii) a two-condition, between-group designs which contrast, for example, Alzheimer's disease (AD) patients with NC (the extension to multi-class analyses is also included). A random-effects inference based on a label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods. This allows both false positives and classifier overfitting to be detected as well as estimating the statistical power of the test. Several experiments were carried out using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the Dominantly Inherited Alzheimer Network (DIAN) dataset, and a MCI prediction dataset. We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power (although lower using cross-validation). A large separation between training and test accuracies using CV was observed, especially in one-condition designs. This implies a low generalization ability as the model fitted in training is not informative with respect to the test set. We propose as solution by applying RUB, whereby similar results are obtained to those of the CV test set, but considering the whole set and with a lower computational cost per iteration.
Text Classification Using Hybrid Machine Learning Algorithms on Big Data
Asogwa, D. C., Anigbogu, S. O., Onyenwe, I. E., Sani, F. A.
Recently, there are unprecedented data growth originating from different online platforms which contribute to big data in terms of volume, velocity, variety and veracity (4Vs). Given this nature of big data which is unstructured, performing analytics to extract meaningful information is currently a great challenge to big data analytics. Collecting and analyzing unstructured textual data allows decision makers to study the escalation of comments/posts on our social media platforms. Hence, there is need for automatic big data analysis to overcome the noise and the non-reliability of these unstructured dataset from the digital media platforms. However, current machine learning algorithms used are performance driven focusing on the classification/prediction accuracy based on known properties learned from the training samples. With the learning task in a large dataset, most machine learning models are known to require high computational cost which eventually leads to computational complexity. In this work, two supervised machine learning algorithms are combined with text mining techniques to produce a hybrid model which consists of Na\"ive Bayes and support vector machines (SVM). This is to increase the efficiency and accuracy of the results obtained and also to reduce the computational cost and complexity. The system also provides an open platform where a group of persons with a common interest can share their comments/messages and these comments classified automatically as legal or illegal. This improves the quality of conversation among users. The hybrid model was developed using WEKA tools and Java programming language. The result shows that the hybrid model gave 96.76% accuracy as against the 61.45% and 69.21% of the Na\"ive Bayes and SVM models respectively.
Human Activity Analysis and Recognition from Smartphones using Machine Learning Techniques
Rabbi, Jakaria, Fuad, Md. Tahmid Hasan, Awal, Md. Abdul
Human Activity Recognition (HAR) is considered a valuable research topic in the last few decades. Different types of machine learning models are used for this purpose, and this is a part of analyzing human behavior through machines. It is not a trivial task to analyze the data from wearable sensors for complex and high dimensions. Nowadays, researchers mostly use smartphones or smart home sensors to capture these data. In our paper, we analyze these data using machine learning models to recognize human activities, which are now widely used for many purposes such as physical and mental health monitoring. We apply different machine learning models and compare performances. We use Logistic Regression (LR) as the benchmark model for its simplicity and excellent performance on a dataset, and to compare, we take Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Network (ANN). Additionally, we select the best set of parameters for each model by grid search. We use the HAR dataset from the UCI Machine Learning Repository as a standard dataset to train and test the models. Throughout the analysis, we can see that the Support Vector Machine performed (average accuracy 96.33%) far better than the other methods. We also prove that the results are statistically significant by employing statistical significance test methods.
Artificial Intelligence and IoT: Naive Bayes
A project-based course to build an AIoT system from theory to prototype. Artificial Intelligence and Automation with Zang Cloud Sample codes are provided for every project in this course. You will receive a certificate of completion when finishing this course. There is also Udemy 30 Day Money Back Guarantee, if you are not satisfied with this course. This course teaches you how to build an AIoT system from theory to prototype particularly using Naive Bayes algorithm.
Deconfounded Score Method: Scoring DAGs with Dense Unobserved Confounding
Bellot, Alexis, van der Schaar, Mihaela
Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this paper we show that beyond conditional independencies, under the principle of independent mechanisms, unobserved confounding in this setting leaves a statistical footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.
Face Recognition as a Method of Authentication in a Web-Based System
Mugalu, Ben Wycliff, Wamala, Rodrick Calvin, Serugunda, Jonathan, Katumba, Andrew
Online information systems currently heavily rely on the username and password traditional method for protecting information and controlling access. With the advancement in biometric technology and popularity of fields like AI and Machine Learning, biometric security is becoming increasingly popular because of the usability advantage. This paper reports how machine learning based face recognition can be integrated into a web-based system as a method of authentication to reap the benefits of improved usability. This paper includes a comparison of combinations of detection and classification algorithms with FaceNet for face recognition. The results show that a combination of MTCNN for detection, Facenet for generating embeddings, and LinearSVC for classification outperforms other combinations with a 95% accuracy. The resulting classifier is integrated into the web-based system and used for authenticating users.
SQAPlanner: Generating Data-Informed Software Quality Improvement Plans
Rajapaksha, Dilini, Tantithamthavorn, Chakkrit, Jiarpakdee, Jirayus, Bergmeir, Christoph, Grundy, John, Buntine, Wray
Software Quality Assurance (SQA) planning aims to define proactive plans, such as defining maximum file size, to prevent the occurrence of software defects in future releases. To aid this, defect prediction models have been proposed to generate insights as the most important factors that are associated with software quality. Such insights that are derived from traditional defect models are far from actionable-i.e., practitioners still do not know what they should do or avoid to decrease the risk of having defects, and what is the risk threshold for each metric. A lack of actionable guidance and risk threshold can lead to inefficient and ineffective SQA planning processes. In this paper, we investigate the practitioners' perceptions of current SQA planning activities, current challenges of such SQA planning activities, and propose four types of guidance to support SQA planning. We then propose and evaluate our AI-Driven SQAPlanner approach, a novel approach for generating four types of guidance and their associated risk thresholds in the form of rule-based explanations for the predictions of defect prediction models. Finally, we develop and evaluate an information visualization for our SQAPlanner approach. Through the use of qualitative survey and empirical evaluation, our results lead us to conclude that SQAPlanner is needed, effective, stable, and practically applicable. We also find that 80% of our survey respondents perceived that our visualization is more actionable. Thus, our SQAPlanner paves a way for novel research in actionable software analytics-i.e., generating actionable guidance on what should practitioners do and not do to decrease the risk of having defects to support SQA planning.