Goto

Collaborating Authors

 Accuracy


Can we Estimate Truck Accident Risk from Telemetric Data using Machine Learning?

arXiv.org Machine Learning

Road accidents have a high societal cost that could be reduced through improved risk predictions using machine learning. This study investigates whether telemetric data collected on long-distance trucks can be used to predict the risk of accidents associated with a driver. We use a dataset provided by a truck transportation company containing the driving data of 1,141 drivers for 18 months. We evaluate two different machine learning approaches to perform this task. In the first approach, features are extracted from the time series data using the FRESH algorithm and then used to estimate the risk using Random Forests. In the second approach, we use a convolutional neural network to directly estimate the risk from the time-series data. We find that neither approach is able to successfully estimate the risk of accidents on this dataset, in spite of many methodological attempts. We discuss the difficulties of using telemetric data for the estimation of the risk of accidents that could explain this negative result.


Dealing with Nuisance Parameters using Machine Learning in High Energy Physics: a Review

arXiv.org Machine Learning

Of these, probably the most common is the use of supervised classification to construct low-dimensional event summaries, which are informative to carry out statistical inference for a given set of parameters of interest. The learned summary statistics -functions of the data that are informative on their relevant properties-can efficiently combine high-dimensional information from each event into one or a few variables which can be used as the basis of statistical inference. The informational source for this compression are simulated observations produced by a complex generative model; the latter reproduces the chain of physical processes occurring in subatomic collisions and the subsequent interaction of the produced final state particles with the detection elements.


Multi-Classifier selection-fusion framework: application to NDT of complex metallic parts

arXiv.org Machine Learning

Recent advances in computational methods, material science, and manufacturing technologies reveal promising potentials for using geometrically complex parts to optimize the performance of structural systems. However, this potential has not yet been activated partly due to the immaturity of nondestructive testing (NDT) of such complex parts. Process compensated resonance testing (PCRT) is one of the methods that are in the focus of researchers for this purpose. The key to success for the PCRT approach is to use high-frequency vibration data in conjunction with statistical pattern recognition methods for supervised classification of parts in terms of their structural quality. In this paper, a multi classifier selection-fusion framework based on the Dempster-Shafer theory is proposed. Two new weighting approaches are introduced to enhance the fusion performance, and as such the classification performance. The effectiveness of the proposed framework is validated by its application to six UCI machine learning datasets and one experimental dataset collected from polycrystalline Nickel alloy first-stage turbine blades with a variety of damage features. Comparison with four state-of-the-art fusion techniques shows the good performance of the introduced classifier selection-fusion framework.


Technologies for Trustworthy Machine Learning: A Survey in a Socio-Technical Context

arXiv.org Artificial Intelligence

Concerns about the societal impact of AI-based services and systems has encouraged governments and other organisations around the world to propose AI policy frameworks to address fairness, accountability, transparency and related topics. To achieve the objectives of these frameworks, the data and software engineers who build machine-learning systems require knowledge about a variety of relevant supporting tools and techniques. In this paper we provide an overview of technologies that support building trustworthy machine learning systems, i.e., systems whose properties justify that people place trust in them. We argue that four categories of system properties are instrumental in achieving the policy objectives, namely fairness, explainability, auditability and safety & security (FEAS). We discuss how these properties need to be considered across all stages of the machine learning life cycle, from data collection through run-time model inference. As a consequence, we survey in this paper the main technologies with respect to all four of the FEAS properties, for data-centric as well as model-centric stages of the machine learning system life cycle. We conclude with an identification of open research problems, with a particular focus on the connection between trustworthy machine learning technologies and their implications for individuals and society.


Human-Expert-Level Brain Tumor Detection Using Deep Learning with Data Distillation and Augmentation

arXiv.org Machine Learning

The application of Deep Learning (DL) for medical diagnosis is often hampered by two problems. First, the amount of training data may be scarce, as it is limited by the number of patients who have acquired the condition to be diagnosed. Second, the training data may be corrupted by various types of noise. Here, we study the problem of brain tumor detection from magnetic resonance spectroscopy (MRS) data, where both types of problems are prominent. To overcome these challenges, we propose a new method for training a deep neural network that distills particularly representative training examples and augments the training data by mixing these samples from one class with those from the same and other classes to create additional training samples. We demonstrate that this technique substantially improves performance, allowing our method to reach human-expert-level accuracy with just a few thousand training examples. Interestingly, the network learns to rely on features of the data that are usually ignored by human experts, suggesting new directions for future research.


Modelling Credit Card Fraud Detection

#artificialintelligence

Credit card frauds are a "still growing" problem in the world. Losses in frauds were estimated in more than US$27 billion in 2018 and are still projected to grow significantly for the next years as this article shows. With more and more people using credit cards in their daily routine, also increased the interest of criminals in opportunities to make money from that. The development of new technologies puts both criminals and credit card companies in a constant race to improve their systems and techniques. With that amount of money at stake, Machine Learning is surely not a new word for credit card companies, which have been investing on that long before it was a trend, to create and optimize models of risk and fraud management.


AI Learns from Lung CT Scans to Diagnose COVID-19

#artificialintelligence

Although the initial wave of the SARS-CoV-2 pandemic has abated in many countries, healthcare providers are still looking to identify as many COVID-19 patients as possible and contain the disease. Fast and accurate diagnosis is especially important when unsuspecting patients with a coronavirus infection come to the hospital with health complaints but don't yet show symptoms of COVID-19. Nasal swab samples analyzed by RT-PCR are currently recommended for the diagnosis of COVID-19, however, supply shortages, a wait time of up to two days for results, and a false negative rate as high as 1 in 5 mean alternative, large-scale COVID-19 screening tools are still being sought. SARS-CoV-2 is known to damage lung tissue, and in a distinct way that doctors are now seeking to exploit for new diagnostic approaches. Many COVID-19 patients develop pneumonia, which can progress to respiratory failure and sometimes death.


Prediction of Cancer Microarray and DNA Methylation Data using Non-negative Matrix Factorization

arXiv.org Machine Learning

Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets. This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms. This technique gives an accuracy of 98%.


Revisiting Data Complexity Metrics Based on Morphology for Overlap and Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular Problems Prospect

arXiv.org Machine Learning

Data Science and Machine Learning have become fundamental assets for companies and research institutions alike. As one of its fields, supervised classification allows for class prediction of new samples, learning from given training data. However, some properties can cause datasets to be problematic to classify. In order to evaluate a dataset a priori, data complexity metrics have been used extensively. They provide information regarding different intrinsic characteristics of the data, which serve to evaluate classifier compatibility and a course of action that improves performance. However, most complexity metrics focus on just one characteristic of the data, which can be insufficient to properly evaluate the dataset towards the classifiers' performance. In fact, class overlap, a very detrimental feature for the classification process (especially when imbalance among class labels is also present) is hard to assess. This research work focuses on revisiting complexity metrics based on data morphology. In accordance to their nature, the premise is that they provide both good estimates for class overlap, and great correlations with the classification performance. For that purpose, a novel family of metrics have been developed. Being based on ball coverage by classes, they are named after Overlap Number of Balls. Finally, some prospects for the adaptation of the former family of metrics to singular (more complex) problems are discussed.


Traceable raises $20 million for AI system that shields cloud app APIs from cyberattacks

#artificialintelligence

Traceable, a startup developing an end-to-end cloud app security solution, today emerged from stealth with $20 million in venture equity financing. Newly flush with capital, CEO Jyoti Bansal intends to focus on acquiring customers globally while growing Traceable's team and accelerating R&D. Cloud-native apps are often built with hundreds or even thousands of API microservices (i.e., loosely coupled services), making them difficult to protect at scale. Gartner predicts that by 2022, API abuses will be the most frequent attack vector, which isn't surprising considering API calls represented 83% of web traffic as of 2018. Traceable ostensibly protects these APIs with machine learning algorithms that analyze app activity from the user and the session all the way down to the code.