Goto

Collaborating Authors

 Decision Tree Learning


Stealing Machine Learning Models via Prediction APIs

arXiv.org Machine Learning

Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service ("predictive analytics") systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.


Under the Decision Tree (#3)

#artificialintelligence

Welcome back for another edition of Under the Decision Tree. This week we had everything from machine learning being applied to cucumber farming, to major tech companies joining up to tackle the ethics of machine learning. Please send any suggestions to: Decision Tree We would love to hear from you.


Machine Learning with Talend - Getting Started

#artificialintelligence

Decision trees are used extensively in machine learning because they are easy to use, easy to interpret, and easy to operationalize. KD Nuggets, one of the most respected sites for data science and machine learning, recently published an article that identified decision trees as a "top 10" algorithm for machine learning. If you are new to machine learning, some of these concepts may be unfamiliar. The goal of this blog is to provide you with the basics of decision trees using Talend and Apache Spark. If you want to learn more about advanced analytics, please see the references section below.(2)


Decision Trees and Political Party Classification

#artificialintelligence

Last time we investigated the k-nearest-neighbors algorithm and the underlying idea that one can learn a classification rule by copying the known classification of nearby data points. This required that we view our data as sitting inside a metric space; that is, we imposed a kind of geometric structure on our data. One glaring problem is that there may be no reasonable way to do this. While we mentioned scaling issues and provided a number of possible metrics in our primer, a more common problem is that the data simply isn't numeric. For instance, a poll of US citizens might ask the respondent to select which of a number of issues he cares most about. There could be 50 choices, and there is no reasonable way to assign these numerical values so that all are equidistant in the resulting metric space. Another issue is that the quality of the data could be bad. For instance, there may be missing values for some attributes (e.g., a respondent may neglect to answer one or more questions).


Data Mining Tutorial

#artificialintelligence

Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts off with a basic overview and the terminologies involved in data mining and then gradually moves on to cover topics such as knowledge discovery, query language, classification and prediction, decision tree induction, cluster analysis, and how to mine the Web. This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. Before proceeding with this tutorial, you should have an understanding of the basic database concepts such as schema, ER model, Structured Query language and a basic knowledge of Data Warehousing concepts.


Understanding Decision Trees and Random Forests ChalkStreet

#artificialintelligence

Decision Trees are a graphic and intuitive method of predicting the outcome of a given input. They attach a weightage to the input variables and help you clearly detect what really influences your outcome. Building a Decision Tree is a tedious procedure, as they have the tendency to overfit. That's where Random Forests come into the picture. Random Forests use an ensemble of Decision Trees, this reduces the complexities without compromising on the advantages.


Under the Decision Tree (#2)

#artificialintelligence

Welcome back for another edition of Under the Decision Tree. As usual there were quite a number of interesting stories focused on machine learning and AI. One particularly interesting topic this week was Micorsoft and its efforts in cancer research. There are two conferences starting on Monday next week. Please send any suggestions to: Decision Tree We would love to hear from you.


Random forest - Wikipedia, the free encyclopedia

#artificialintelligence

Random forests or random decision forests[1][2] are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.[3]:587–588 The first algorithm for random decision forests was created by Tin Kam Ho [1] using the random subspace method,[2] which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.[4][5][6] An extension of the algorithm was developed by Leo Breiman[7] and Adele Cutler,[8] and "Random Forests" is their trademark.[9] The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho[1] and later independently by Amit and Geman[10] in order to construct a collection of decision trees with controlled variance.


Data Science Basics: 3 Insights for Beginners

#artificialintelligence

In supervised learning, the learning algorithm is provided outcome data in advance, in the form of a pre-labeled set of instances. It is from this set that the algorithm is expected to learn what to do when it encounters future, previously unseen instances. Classification is a form of supervised learning. As an example, take the biological taxonomic hierarchy. Organisms are grouped into successfully more specific ranks of domain, kingdom, phylum, etc.


Decision Trees Tutorial

#artificialintelligence

Certain groups of people, such as women and children, might be entitled to receiving help first, granting them a higher chance of survival. Knowing whether you belong to one of these privileged groups would help predict whether you would make it out alive. To identify which groups have higher survival rates, we can use decision trees. While we forecast the rate of survival here, decision trees are used in a a wide range of applications. In the business setting, it can be used to define customer profiles or to predict who would resign.