AITopics

Shigeto, Yutaro, Shimbo, Masashi, Matsumoto, Yuji

A Fast and Easy Regression Technique for k-NN Classification Without Using Negative Pairs

arXiv.org Machine LearningJun-11-2018

This paper proposes an inexpensive way to learn an effective dissimilarity function to be used for $k$-nearest neighbor ($k$-NN) classification. Unlike Mahalanobis metric learning methods that map both query (unlabeled) objects and labeled objects to new coordinates by a single transformation, our method learns a transformation of labeled objects to new points in the feature space whereas query objects are kept in their original coordinates. This method has several advantages over existing distance metric learning methods: (i) In experiments with large document and image datasets, it achieves $k$-NN classification accuracy better than or at least comparable to the state-of-the-art metric learning methods. (ii) The transformation can be learned efficiently by solving a standard ridge regression problem. For document and image datasets, training is often more than two orders of magnitude faster than the fastest metric learning methods tested. This speed-up is also due to the fact that the proposed method eliminates the optimization over "negative" object pairs, i.e., objects whose class labels are different. (iii) The formulation has a theoretical justification in terms of reducing hubness in data.

artificial intelligence, dataset, machine learning, (14 more...)

1806.03945

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Chiba Prefecture > Chiba (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Seibold, Clemens, Samek, Wojciech, Hilsmann, Anna, Eisert, Peter

Accurate and Robust Neural Networks for Security Related Applications Exampled by Face Morphing Attacks

arXiv.org Artificial IntelligenceJun-11-2018

Artificial neural networks tend to learn only what they need for a task. A manipulation of the training data can counter this phenomenon. In this paper, we study the effect of different alterations of the training data, which limit the amount and position of information that is available for the decision making. We analyze the accuracy and robustness against semantic and black box attacks on the networks that were trained on different training data modifications for the particular example of morphing attacks. A morphing attack is an attack on a biometric facial recognition system where the system is fooled to match two different individuals with the same synthetic face image. Such a synthetic image can be created by aligning and blending images of the two individuals that should be matched with this image.

artificial intelligence, machine learning, partial morph, (19 more...)

arXiv.org Artificial Intelligence

1806.04265

Country:

Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
Europe > Germany > Berlin (0.04)
Asia (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-10-2018

ActiveRemediation: The Search for Lead Pipes in Flint, Michigan

Abernethy, Jacob, Chojnacki, Alex, Farahi, Arya, Schwartz, Eric, Webb, Jared

We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents' drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.

artificial intelligence, machine learning, replacement, (17 more...)

doi: 10.1145/3219819.3219896

1806.10692

Country: North America > United States > Michigan > Genesee County > Flint (0.70)

Genre: Research Report (0.82)

Industry:

Water & Waste Management > Water Management > Water Supplies & Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningJun-9-2018

Reconstructing networks with unknown and heterogeneous errors

Peixoto, Tiago P.

The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for individual edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable error estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1806.07956

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain (0.04)
Europe > Hungary > Budapest > Budapest (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)
Leisure & Entertainment > Sports (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-9-2018

A Taxonomy and Survey of Intrusion Detection System Design Techniques, Network Threats and Datasets

Hindy, Hanan, Brosset, David, Bayne, Ethan, Seeam, Amar, Tachtatzis, Christos, Atkinson, Robert, Bellekens, Xavier

With the world moving towards being increasingly dependent on computers and automation, one of the main challenges in the current decade has been to build secure applications, systems and networks. Alongside these challenges, the number of threats is rising exponentially due to the attack surface increasing through numerous interfaces offered for each service. To alleviate the impact of these threats, researchers have proposed numerous solutions; however, current tools often fail to adapt to ever-changing architectures, associated threats and 0-days. This manuscript aims to provide researchers with a taxonomy and survey of current dataset composition and current Intrusion Detection Systems (IDS) capabilities and assets. These taxonomies and surveys aim to improve both the efficiency of IDS and the creation of datasets to build the next generation IDS as well as to reflect networks threats more accurately in future datasets. To this end, this manuscript also provides a taxonomy and survey or network threats and associated tools. The manuscript highlights that current IDS only cover 25% of our threat taxonomy, while current datasets demonstrate clear lack of real-network threats and attack representation, but rather include a large number of deprecated threats, hence limiting the accuracy of current machine learning IDS. Moreover, the taxonomies are open-sourced to allow public contributions through a Github repository.

data mining, evolutionary algorithm, machine learning, (11 more...)

arXiv.org Artificial Intelligence

1806.03517

Country:

Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
Europe > France (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
(5 more...)

Genre:

Overview (0.66)
Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(8 more...)

Data-driven model for the identification of the rock type at a drilling bit

Klyuchnikov, Nikita, Zaytsev, Alexey, Gruzdev, Arseniy, Ovchinnikov, Georgiy, Antipova, Ksenia, Ismailova, Leyla, Muravleva, Ekaterina, Burnaev, Evgeny, Semenikhin, Artyom, Cherepanov, Alexey, Koryabkin, Vitaliy, Simon, Igor, Tsurgan, Alexey, Krasnov, Fedor, Koroteev, Dmitry

In order to bridge the gap of more than 15m between the drilling bit and high-fidelity rock type sensors during the directional drilling, we present a novel approach for identifying rock type at the drilling bit. The approach is based on application of machine learning techniques for Measurements While Drilling (MWD) data. We demonstrate capabilities of the developed approach for distinguishing between the rock types corresponding to (1) a target oil bearing interval of a reservoir and (2) a non-productive shale layer and compare it to more traditional physics-driven approaches. The dataset includes MWD data and lithology mapping along multiple wellbores obtained by processing of Logging While Drilling (LWD) measurements from a massive drilling effort on one of the major newly developed oilfield in the North of Western Siberia. We compare various machine-learning algorithms, examine extra features coming from physical modeling of drilling mechanics, and show that the classification error can be reduced from 13.5% to 9%.

classifier, deep learning, upstream oil & gas, (19 more...)

1806.03218

Genre:

Research Report > New Finding (0.47)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Kilbertus, Niki, Gascón, Adrià, Kusner, Matt J., Veale, Michael, Gummadi, Krishna P., Weller, Adrian

Blind Justice: Fairness with Encrypted Sensitive Attributes

Recent work has explored how to train machine learning models which do not discriminate against any subgroup of the population as determined by sensitive attributes such as gender or race. To avoid disparate treatment, sensitive attributes should not be considered. On the other hand, in order to avoid disparate impact, sensitive attributes must be examined, e.g., in order to learn a fair model, or to check if a given model is fair. We introduce methods from secure multi-party computation which allow us to avoid both. By encrypting sensitive attributes, we show how an outcome-based fair model may be learned, checked, or have its outputs verified and held to account, without users revealing their sensitive attributes.

artificial intelligence, constraint, machine learning, (15 more...)

1806.03281

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Bull, Kieran, He, Yang-Hui, Jejjala, Vishnu, Mishra, Challenger

Machine Learning CICY Threefolds

The latest techniques from Neural Networks and Support Vector Machines (SVM) are used to investigate geometric properties of Complete Intersection Calabi-Yau (CICY) threefolds, a class of manifolds that facilitate string model building. An advanced neural network classifier and SVM are employed to (1) learn Hodge numbers and report a remarkable improvement over previous efforts, (2) query for favourability, and (3) predict discrete symmetries, a highly imbalanced problem to which the Synthetic Minority Oversampling Technique (SMOTE) is applied to boost performance. In each case study, we employ a genetic algorithm to optimise the hyperparameters of the neural network. We demonstrate that our approach provides quick diagnostic tools capable of shortlisting quasi-realistic string models based on compactification over smooth CICYs and further supports the paradigm that classes of problems in algebraic geometry can be machine learned.

artificial intelligence, machine learning, neural network, (17 more...)

1806.03121

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Vasilev, Aleksei, Golkov, Vladimir, Lipp, Ilona, Sgarlata, Eleonora, Tomassini, Valentina, Jones, Derek K., Cremers, Daniel

q-Space Novelty Detection with Variational Autoencoders

In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we define novelty metrics based on the (partially complementary) assumptions that the VAE is less capable of reconstructing abnormal samples well; that abnormal samples more strongly violate the VAE regularizer; and that abnormal samples differ from normal samples not only in input-feature space, but also in the VAE latent space and VAE output. These approaches, combined with various possibilities of using (e.g.~sampling) the probabilistic VAE to obtain scalar novelty scores, yield a large family of methods. We apply these methods to magnetic resonance imaging, namely to the detection of diffusion-space (\mbox{q-space}) abnormalities in diffusion MRI scans of multiple sclerosis patients, i.e.~to detect multiple sclerosis lesions without using any lesion labels for training. Many of our methods outperform previously proposed q-space novelty detection methods.

artificial intelligence, data mining, machine learning, (16 more...)

1806.02997

Country:

Europe > Italy > Lazio > Rome (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.90)
Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)