AITopics

In machine learning, the one-class classification problem occurs when training instances are only available from one class. It has been observed that making use of this class's structure, or its different contexts, may improve one-class classifier performance. Although this observation has been demonstrated for static data, a rigorous application of the idea within the data stream environment is lacking. To address this gap, we propose the use of context to guide one-class classifier learning in data streams, paying particular attention to the challenges presented by the dynamic learning environment. We present three frameworks that learn contexts and conduct experiments with synthetic and benchmark data streams. We conclude that the paradigm of contexts in data streams can be used to improve the performance of streaming one-class classifiers.

artificial intelligence, data stream, machine learning, (17 more...)

1907.04233

Country:

North America (0.46)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)

Wexler, James, Pushkarna, Mahima, Bolukbasi, Tolga, Wattenberg, Martin, Viegas, Fernanda, Wilson, Jimbo

The What-If Tool: Interactive Probing of Machine Learning Models

A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, analyze the importance of different data features, and visualize model behavior across multiple models and subsets of input data. It also lets practitioners measure systems according to multiple ML fairness metrics. We describe the design of the tool, and report on real-life usage at different organizations.

artificial intelligence, machine learning, visualization, (18 more...)

1907.04135

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Law (0.67)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Bellot, Alexis, van der Schaar, Mihaela

A Robust Two-Sample Test for Time Series data

We develop a general framework for hypothesis testing with time series data. The problem is to distinguish between the mean functions of the underlying temporal processes of populations of times series, which are often irregularly sampled and measured with error. Such an observation pattern can result in substantial uncertainty about the underlying trajectory, quantifying it accurately is important to ensure robust tests. We propose a new test statistic that views each trajectory as a sample from a distribution on functions and considers the distributions themselves to encode the uncertainty between observations. We derive asymptotic null distributions and power functions for our test and put emphasis on computational considerations by giving an efficient kernel learning framework to prevent over-fitting in small samples and also showing how to scale our test to densely sampled time series. We conclude with performance evaluations on synthetic data and experiments on healthcare and climate change data.

artificial intelligence, machine learning, trajectory, (17 more...)

1907.04081

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Bellot, Alexis, van der Schaar, Mihaela

Conditional Independence Testing using Generative Adversarial Networks

We consider the hypothesis testing problem of detecting conditional dependence, with a focus on high-dimensional feature spaces. Our contribution is a new test statistic based on samples from a generative adversarial network designed to approximate directly a conditional distribution that encodes the null hypothesis, in a manner that maximizes power (the rate of true negatives). We show that such an approach requires only that density approximation be viable in order to ensure that we control type I error (the rate of false positives); in particular, no assumptions need to be made on the form of the distributions or feature dependencies. Using synthetic simulations with high-dimensional data we demonstrate significant gains in power over competing methods. In addition, we illustrate the use of our test to discover causal markers of disease in genetic data.

artificial intelligence, gcit, machine learning, (15 more...)

1907.04068

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Nonconvex Regularized Robust Regression with Oracle Properties in Polynomial Time

Pan, Xiaoou, Sun, Qiang, Zhou, Wen-Xin

This paper investigates tradeoffs among optimization errors, statistical rates of convergence and the effect of heavy-tailed random errors for high-dimensional adaptive Huber regression with nonconvex regularization. When the additive errors in linear models have only bounded second moment, our results suggest that adaptive Huber regression with nonconvex regularization yields statistically optimal estimators that satisfy oracle properties as if the true underlying support set were known beforehand. Computationally, we need as many as O(log s + log log d) convex relaxations to reach such oracle estimators, where s and d denote the sparsity and ambient dimension, respectively. Numerical studies lend strong support to our methodology and theory.

artificial intelligence, machine learning, optimization problem, (19 more...)

1907.04027

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

BaniMustafa, Ahmed, Hardy, Nigel

Applications of a Novel Knowledge Discovery and Data Mining Process Model for Metabolomics

arXiv.org Machine LearningJul-8-2019

This work demonstrates the execution of a novel process model for knowledge discovery and data mining for metabolomics (MeKDDaM). It aims to illustrate MeKDDaM process model applicability using four different real-world applications and to highlight its strengths and unique features. The demonstrated applications provide coverage for metabolite profiling, target analysis, and metabolic fingerprinting. The data analysed in these applications were captured by chromatographic separation and mass spectrometry technique (LC-MS), Fourier transform infrared spectroscopy (FT-IR), and nuclear magnetic resonance spectroscopy (NMR) and involve the analysis of plant, animal, and human samples. The process was executed using both data-driven and hypothesis-driven data mining approaches in order to perform various data mining goals and tasks by applying a number of data mining techniques. The applications were selected to achieve a range of analytical goals and research questions and to provide coverage for metabolite profiling, target analysis, and metabolic fingerprinting using datasets that were captured by NMR, LC-MS, and FT-IR using samples of a plant, animal, and human origin. The process was applied using an implementation environment which was created in order to provide a computer-aided realisation of the process model execution.

application, decision tree learning, upstream oil & gas, (22 more...)

1907.03755

Country:

North America > Canada > Alberta (0.28)
Europe > United Kingdom > Wales > Ceredigion (0.14)
North America > United States > New York (0.14)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Materials > Metals & Mining (1.00)
Energy > Oil & Gas > Upstream (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining > Knowledge Discovery (0.63)

Bertsimas, Dimitris, Delarue, Arthur, Jaillet, Patrick, Martin, Sebastien

Optimal Explanations of Linear Models

arXiv.org Machine LearningJul-8-2019

When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framework to create explanations for linear models. Our methodology decomposes a linear model into a sequence of models of increasing complexity using coordinate updates on the coefficients. Computing this decomposition optimally is a difficult optimization problem for which we propose exact algorithms and scalable heuristics. By solving this problem, we can derive a parametrized family of interpretability metrics for linear models that generalizes typical proxies, and study the tradeoff between interpretability and predictive accuracy.

artificial intelligence, interpretability, machine learning, (18 more...)

1907.04669

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.82)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Machine LearningJul-8-2019

Non-technical Loss Detection with Statistical Profile Images Based on Semi-supervised Learning

Li, Jiangteng, Wang, Fei

In order to keep track of the operational state of power grid, the world's largest sensor systems, smart grid, was built by deploying hundreds of millions of smart meters. Such system makes it possible to discover and make quick response to any hidden threat to the entire power grid. Non-technical losses (NTLs) have always been a major concern for its consequent security risks as well as immeasurable revenue loss. However, various causes of NTL may have different characteristics reflected in the data. Accurately capturing these anomalies faced with such large scale of collected data records is rather tricky as a result. In this paper, we proposed a new methodology of detecting abnormal electricity consumptions. We did a transformation of the collected time-series data which turns it into an image representation that could well reflect users' relatively long term consumption behaviors. Inspired by the excellent neural network architecture used for objective detection in computer vision domain, we designed our deep learning model that takes the transformed images as input and yields joint featured inferred from the multiple aspects the input provides. Considering the limited labeled samples, especially the abnormal ones, we used our model in a semi-supervised fashion that is brought out in recent years. The model is tested on samples which are verified by on-field inspections and our method showed significant improvement.

artificial intelligence, customer, machine learning, (19 more...)

1907.03925

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.88)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Devine, Sean M., Bastian, Nathaniel D.

Intelligent Systems Design for Malware Classification Under Adversarial Conditions

arXiv.org Machine LearningJul-6-2019

The use of machine learning and intelligent systems has become an established practice in the realm of malware detection and cyber threat prevention. In an environment characterized by widespread accessibility and big data, the feasibility of malware classification without the use of artificial intelligence-based techniques has been diminished exponentially. Also characteristic of the contemporary realm of automated, intelligent malware detection is the threat of adversarial machine learning. Adversaries are looking to target the underlying data and/or algorithm responsible for the functionality of malware classification to map its behavior or corrupt its functionality. The ends of such adversaries are bypassing the cyber security measures and increasing malware effectiveness. The focus of this research is the design of an intelligent systems approach using machine learning that can accurately and robustly classify malware under adversarial conditions. Such an outcome ultimately relies on increased flexibility and adaptability to build a model robust enough to identify attacks on the underlying algorithm.

classifier, data mining, machine learning, (19 more...)

1907.03149

Country: North America > United States (0.29)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

arXiv.org Artificial IntelligenceJul-5-2019

Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets

Liu, Xiaofeng, Kumar, B. V. K Vijaya, Yang, Chao, Tang, Qingming, You, Jane

This paper targets the problem of image set-based face verification and identification. Unlike traditional single media (an image or video) setting, we encounter a set of heterogeneous contents containing orderless images and videos. The importance of each image is usually considered either equal or based on their independent quality assessment. How to model the relationship of orderless images within a set remains a challenge. We address this problem by formulating it as a Markov Decision Process (MDP) in the latent space. Specifically, we first present a dependency-aware attention control (DAC) network, which resorts to actor-critic reinforcement learning for sequential attention decision of each image embedding to fully exploit the rich correlation cues among the unordered images. Moreover, we introduce its sample-efficient variant with off-policy experience replay to speed up the learning process. The pose-guided representation scheme can further boost the performance at the extremes of the pose variation.

machine learning, recognition, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1907.0303

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)