Goto

Collaborating Authors

 Accuracy


Attack Solutions

#artificialintelligence

Human intelligence and intuition are vital to training artificial intelligence (AI) and machine learning (ML) models to provide enterprises with hybrid cybersecurity at scale. Combining human intelligence and intuition with AI and ML models helps catch the nuances of attack patterns that elude numerical analysis alone. Experienced threat hunters, security analysts and data scientists help ensure that the data used to train AI and ML models enables a model to accurately identify threats and reduce false positives. Combining human expertise and AI and ML models with a real-time stream of telemetry data from enterprises' many systems and apps defines the future of hybrid cybersecurity. "Based on behaviors and insights, AI and ML allow us to predict [that] something will happen before it does," says Monique Shivanandan, CISO at HSBC, a global bank.


Integrating Transformer and Autoencoder Techniques with Spectral Graph Algorithms for the Prediction of Scarcely Labeled Molecular Data

arXiv.org Artificial Intelligence

In molecular and biological sciences, experiments are expensive, time-consuming, and often subject to ethical constraints. Consequently, one often faces the challenging task of predicting desirable properties from small data sets or scarcely-labeled data sets. Although transfer learning can be advantageous, it requires the existence of a related large data set. This work introduces three graph-based models incorporating Merriman-Bence-Osher (MBO) techniques to tackle this challenge. Specifically, graph-based modifications of the MBO scheme are integrated with state-of-the-art techniques, including a home-made transformer and an autoencoder, in order to deal with scarcely-labeled data sets. In addition, a consensus technique is detailed. The proposed models are validated using five benchmark data sets. We also provide a thorough comparison to other competing methods, such as support vector machines, random forests, and gradient boosting decision trees, which are known for their good performance on small data sets. The performances of various methods are analyzed using residue-similarity (R-S) scores and R-S indices. Extensive computational experiments and theoretical analysis show that the new models perform very well even when as little as 1% of the data set is used as labeled data.


Understanding Urban Water Consumption using Remotely Sensed Data

arXiv.org Artificial Intelligence

Urban metabolism is an active field of research that deals with the estimation of emissions and resource consumption from urban regions. The analysis could be carried out through a manual surveyor by the implementation of elegant machine learning algorithms. In this exploratory work, we estimate the water consumption by the buildings in the region captured by satellite imagery. To this end, we break our analysis into three parts: i) Identification of building pixels, given a satellite image, followed by ii) identification of the building type (residential/non-residential) from the building pixels, and finally iii) using the building pixels along with their type to estimate the water consumption using the average per unit area consumption for different building types as obtained from municipal surveys.


Micro, Macro & Weighted Averages of F1 Score, Clearly Explained - KDnuggets

#artificialintelligence

The F1 score (aka F-measure) is a popular metric for evaluating the performance of a classification model. In the case of multi-class classification, we adopt averaging methods for F1 score calculation, resulting in a set of different average scores (macro, weighted, micro) in the classification report. This article looks at the meaning of these averages, how to calculate them, and which one to choose for reporting. Note: Skip this section if you are already familiar with the concepts of precision, recall, and F1 score. Layman definition: Of all the positive predictions I made, how many of them are truly positive?


GitHub - leanderme/sytora: A sophisticated smart symptom search engine

#artificialintelligence

Sytora is a multilingual symptom-disease classification app. Translation is managed through the UMLS coding standard. A multinomial Naive Bayes classifier is trained on a handpicked dataset, which is freely available under CC4.0. Check out sytora.com for a demo. Finding the right diagnosis cannot be achieved by extracting symptoms and running a classification algorithm.


The Department of Homeland Security says it developed a portable gunshot detection system

Engadget

The Department of Homeland Security (DHS) says its Science and Technology Directorate division has created a portable gunshot detection system with the help of a company called Shooter Detection Systems (SDS). The agency notes that whereas other systems only detect audio, SDS Outdoor can pinpoint flashes of gunshots as well. DHS claims this approach can reduce false positive rates. DHS has not disclosed details about the accuracy of the system. SDS, which is owned by Alarm.com, says its indoor gunshot detection system has a near-100 percent detection rate with fewer than one false alert per 5 million hours of use.


Type I and Type II Errors: What's the Difference? - KDnuggets

#artificialintelligence

Let's illustrate Type I and Type II errors using a binary classification machine learning spam filter. We will assume that we have a labelled dataset of N 315 emails, 244 of which are labelled as spam, and 71 are not-spam. Supposed that we've built a machine learning classification algorithm to learn from this data. Now we would like to evaluate the performance of the machine learning model. How good was the model in correctly detecting the spam vs not-spam emails? We will assume that whenever the model predicts an email to be a spam email, the email will be deleted and saved in the spam folder.


Solving The Class Imbalance Problem

#artificialintelligence

Imbalanced classification is a common problem in machine learning, particularly in the realm of binary classification. This occurs when the training dataset has an unequal distribution of classes, leading to a potential bias in the trained model. Examples of imbalanced classification problems include fraud detection, claim prediction, default prediction, churn prediction, spam detection, anomaly detection, and outlier detection. It is important to address the class imbalance in order to improve the performance of our model and ensure its accuracy. Notice that most, if not all, of the examples, are likely binary classification problems.


Multi-Task Learning with Prior Information

arXiv.org Artificial Intelligence

Multi-task learning aims to boost the generalization performance of multiple related tasks simultaneously by leveraging information contained in those tasks. In this paper, we propose a multi-task learning framework, where we utilize prior knowledge about the relations between features. We also impose a penalty on the coefficients changing for each specific feature to ensure related tasks have similar coefficients on common features shared among them. In addition, we capture a common set of features via group sparsity. The objective is formulated as a non-smooth convex optimization problem, which can be solved with various methods, including gradient descent method with fixed stepsize, iterative shrinkage-thresholding algorithm (ISTA) with back-tracking, and its variation -- fast iterative shrinkage-thresholding algorithm (FISTA). In light of the sub-linear convergence rate of the methods aforementioned, we propose an asymptotically linear convergent algorithm with theoretical guarantee. Empirical experiments on both regression and classification tasks with real-world datasets demonstrate that our proposed algorithms are capable of improving the generalization performance of multiple related tasks.


Identifying Exoplanets with Deep Learning. V. Improved Light Curve Classification for TESS Full Frame Image Observations

arXiv.org Artificial Intelligence

ABSTRACT The TESS mission produces a large amount of time series data, only a small fraction of which contain detectable exoplanetary transit signals. Deep learning techniques such as neural networks have proved effective at differentiating promising astrophysical eclipsing candidates from other phenomena such as stellar variability and systematic instrumental effects in an efficient, unbiased and sustainable manner. This paper presents a high quality dataset containing light curves from the Primary Mission and 1st Extended Mission full frame images and periodic signals detected via Box Least Squares (Kovács et al. 2002; Hartman 2012). The dataset was curated using a thorough manual review process then used to train a neural network called Astronet-Triage-v2. On our test set, for transiting/eclipsing events we achieve a 99.6% recall (true positives over all data with positive labels) at a precision of 75.7% (true positives over all predicted positives). Since 90% of our training data is from the Primary Mission, we also test our ability to generalize on held-out 1st Extended Mission data. Here, we find an area under the precision-recall curve of 0.965, a 4% improvement over Astronet-Triage (Yu et al. 2019). On the TESS Object of Interest (TOI) Catalog through April 2022, a shortlist of planets and planet candidates, Astronet-Triage-v2 is able to recover 3577 out of 4140 TOIs, while Astronet-Triage only recovers 3349 targets at an equal level of precision. In other words, upgrading to Astronet-Triage-v2 helps save at least 200 planet candidates from being lost. The new model is currently used for planet candidate triage in the Quick-Look Pipeline (Huang et al. 2020a,b; Kunimoto et al. 2021). INTRODUCTION ally requires extremely precise observations.