AITopics

doi: 10.1117/12.2563843

2005.09787

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.94)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Velthoen, Jasper, Cai, Juan-Juan, Jongbloed, Geurt

Interpretable random forest models through forward variable selection

arXiv.org Machine LearningMay-11-2020

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.

artificial intelligence, decision tree learning, machine learning, (16 more...)

2005.05113

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

#artificialintelligenceMay-8-2020, 02:54:49 GMT

Adversarial Robustness Toolbox v1.2 releases: crafting and analysis of attacks and defense methods for machine learning models • Penetration Testing

Adversarial Robustness 360 Toolbox (ART) is a Python library supporting developers and researchers in defending Machine Learning models (Deep Neural Networks, Gradient Boosted Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, Gaussian Processes, Decision Trees, Scikit-learn Pipelines, etc.) against adversarial threats and helps making AI systems more secure and trustworthy. Machine Learning models are vulnerable to adversarial examples, which are inputs (images, texts, tabular data, etc.) deliberately modified to produce a desired response by the Machine Learning model. ART provides the tools to build and deploy defenses and test them with adversarial attacks. Defending Machine Learning models involves certifying and verifying model robustness and model hardening with approaches such as pre-processing inputs, augmenting training data with adversarial samples, and leveraging runtime detection methods to flag any inputs that might have been modified by an adversary. The attacks implemented in ART allow creating adversarial attacks against Machine Learning models which are required to test defenses with state-of-the-art threat models.

artificial intelligence, decision tree learning, machine learning model, (5 more...)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.61)
(2 more...)

#artificialintelligenceMay-8-2020, 02:49:45 GMT

Why White-Box Models in Enterprise Data Science Work More Efficiently

Data science is the current powerhouse for organizations, turning mountains of data into actionable business insights that impact every part of the business, including customer experience, revenue, operations, risk management and other functions. Data science has the potential to dramatically accelerate digital transformation initiatives, delivering greater performance and advantages over the competition. However, not all data science platforms and methodologies are created equal. The ability to use data science to make predictions and take decisions that optimize business outcome requires transparency and accountability. There are several underlying factors such as trust, having confidence in the prediction and understanding how the technology works, but fundamentally it comes down to whether the platform uses a black-box or white-box model approach.

data mining, machine learning, prediction, (14 more...)

Industry: Information Technology (0.56)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.31)

arXiv.org Machine LearningMay-8-2020

JigSaw: A tool for discovering explanatory high-order interactions from random forests

DiMucci, Demetrius

Machine learning is revolutionizing biology by facilitating the prediction of outcomes from complex patterns found in massive data sets. Large biological data sets, like those generated by transcriptome or microbiome studies,measure many relevant components that interact in vivo with one another in modular ways.Identifying the high-order interactions that machine learning models use to make predictions would facilitate the development of hypotheses linking combinations of measured components to outcome. By using the structure of random forests, a new algorithmic approach, termed JigSaw,was developed to aid in the discovery of patterns that could explain predictions made by the forest. By examining the patterns of individual decision trees JigSaw identifies high-order interactions between measured features that are strongly associated with a particular outcome and identifies the relevant decision thresholds. JigSaw's effectiveness was tested in simulation studies where it was able to recover multiple ground truth patterns;even in the presence of significant noise. It was then used to find patterns associated with outcomes in two real world data sets.It was first used to identify patterns clinical measurements associated with heart disease. It was then used to find patterns associated with breast cancer using metabolites measured in the blood. In heart disease, JigSaw identified several three-way interactions that combine to explain most of the heart disease records (66%) with high precision (93%). In breast cancer, three two-way interactions were recovered that can be combined to explain almost all records (92%) with good precision (79%). JigSaw is an efficient method for exploring high-dimensional feature spaces for rules that explain statistical associations with a given outcome and can inspire the generation of testable hypotheses.

artificial intelligence, jigsaw, machine learning, (20 more...)

2005.04342

Country:

Europe > Portugal > Coimbra > Coimbra (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.74)

Sánchez-Hernández, Fernando, Ballesteros-Herráez, Juan Carlos, Kraiem, Mohamed S., Sánchez-Barba, Mercedes, Moreno-García, María N.

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach

arXiv.org Machine LearningMay-7-2020

Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.

artificial intelligence, classifier, machine learning, (17 more...)

doi: 10.3390/app9245287

2005.03582

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > New Zealand > North Island > Waikato > Hamilton (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.96)

#artificialintelligenceMay-4-2020, 09:58:32 GMT

Tree-based Machine Learning Models for Handling Imbalanced Datasets

Recently, I have been working on a binary classification problem with an imbalanced dataset, where the ratio of positive class to negative class is around 1:4. Imbalanced classification problems are so commonplace that data enthusiasts would encounter them sooner or later. In this post, I will be sharing three tree-based Machine Learning Models that can help handle imbalanced datasets. The dataset that I am going to use to illustrate the effectiveness of algorithms is the credit card fraud dataset from Kaggle. This is an extremely imbalanced dataset: out of 284,807 transactions, there are only 492 frauds. Following the convention, we label the fraud class samples as positive class and normal transactions, negative class.

artificial intelligence, decision tree learning, machine learning, (16 more...)

Industry: Law Enforcement & Public Safety > Fraud (0.39)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.39)

Sokol, Kacper, Flach, Peter

LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees

arXiv.org Artificial IntelligenceMay-4-2020

Systems based on artificial intelligence and machine learning models should be transparent, in the sense of being capable of explaining their decisions to gain humans' approval and trust. While there are a number of explainability techniques that can be used to this end, many of them are only capable of outputting a single one-size-fits-all explanation that simply cannot address all of the explainees' diverse needs. In this work we introduce a model-agnostic and post-hoc local explainability technique for black-box predictions called LIMEtree, which employs surrogate multi-output regression trees. We validate our algorithm on a deep neural network trained for object detection in images and compare it against Local Interpretable Model-agnostic Explanations (LIME). Our method comes with local fidelity guarantees and can produce a range of diverse explanation types, including contrastive and counterfactual explanations praised in the literature. Some of these explanations can be interactively personalised to create bespoke, meaningful and actionable insights into the model's behaviour. While other methods may give an illusion of customisability by wrapping, otherwise static, explanations in an interactive interface, our explanations are truly interactive, in the sense of allowing the user to "interrogate" a black-box model. LIMEtree can therefore produce consistent explanations on which an interactive exploratory process can be built.

explanation, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2005.01427

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

#artificialintelligenceMay-2-2020, 10:52:17 GMT

Machine Learning: An Introduction to Decision Trees

Machine Learning for trading is the new buzz word today and some of the tech companies are doing wonderful unimaginable things with it. Today, we're going to show you, how you can predict stock movements (that's either up or down) with the help of'Decision Trees', one of the most commonly used ML algorithms. Decision trees in Machine Learning are used for building classification and regression models to be used in data mining and trading. A decision tree algorithm performs a set of recursive actions before it arrives at the end result and when you plot these actions on a screen, the visual looks like a big tree, hence the name'Decision Tree'. Basically, a decision tree is a flowchart to help you make decisions.

artificial intelligence, decision tree, machine learning, (12 more...)

Industry: Banking & Finance > Trading (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Chivers, Benedict Delahaye, Wallbank, John, Cole, Steven J., Sebek, Ondrej, Stanley, Simon, Fry, Matthew, Leontidis, Georgios

Imputation of missing sub-hourly precipitation data in a large sensor network: a machine learning approach

arXiv.org Machine LearningMay-2-2020

Precipitation data collected at sub-hourly resolution represents specific challenges for missing data recovery by being largely stochastic in nature and highly unbalanced in the duration of rain vs nonrain. Here we present a two-step analysis utilising current machine learning techniques for imputing precipitation data sampled at 30-minute intervals by devolving the task into (a) the classification of rain or non-rain samples, and (b) regressing the absolute values of predicted rain samples. Investigating 37 weather stations in the UK, this machine learning process produces more accurate predictions for recovering precipitation data than an established surface fitting technique utilising neighbouring rain gauges. Increasing available features for the training of machine learning algorithms increases performance with the integration of weather data at the target site with externally sourced rain gauges providing the highest performance. This method informs machine learning models by utilising information in concurrently collected environmental data to make accurate predictions of missing rain data. Capturing complex nonlinear relationships from weakly correlated variables is critical for data recovery at sub-hourly resolutions. Such pipelines for data recovery can be developed and deployed for highly automated and near instantaneous imputation of missing values in ongoing datasets at high temporal resolutions. Keywords: machine learning, data imputation, gradient boosted trees, environmental sensor networks, precipitation, soil moisture 1. Introduction Precipitation data is of critical importance across multiple lines of enquiry, informing statistical models and analysis relating to weather forecasting, extreme weather events, climate change, water-resource management, droughts, flooding, agricultural impact, and hydroelectric power. Historical rainfall data can reveal long term trends in environmental hydrological issues with real-time data input allowing for immediate forecasting of future conditions. Distributed networks of rain gauges are typically used to provide precipitation data at the earth's surface at varying temporal resolutions and can cover large geographical areas (Kidd, 2001). As is the case in many databases, particularly those utilising physical sensors, the problem of missing data arises. Missing data can be a result of sensor failure, data storage/transmission failure, or post-collection quality control procedures resulting in removal of identified problem data (Blenkinsop et al., 2017). Missing data in precipitation databases represents a serious limitation for the effective use of the data. Given the global scale and importance of precipitation and meteorological data (Sun et al., 2018), developing solutions to missing data is of paramount importance for maximising information gain.

artificial intelligence, gauge, machine learning, (19 more...)

doi: 10.1016/j.jhydrol.2020.125126

2004.11123

Country:

Europe > Western Europe (0.04)
Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > Scotland (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Energy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)