AITopics | Accuracy

There are a number of machine learning models to choose from. We can use Linear Regression to predict a value, Logistic Regression to classify distinct outcomes, and Neural Networks to model non-linear behaviors. When we build these models, we always use a set of historical data to help our machine learning algorithms learn what is the relationship between a set of input features to a predicted output. But even if this model can accurately predict a value from historical data, how do we know it will work as well on new data? Or more plainly, how do we evaluate whether a machine learning model is actually "good"?

artificial intelligence, machine learning, positive class, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Add feedback

Robust Contextual Outlier Detection: Where Context Meets Sparsity

Liang, Jiongqian, Parthasarathy, Srinivasan

arXiv.org Artificial IntelligenceDec-22-2016

Outlier detection is a fundamental data science task with applications ranging from data cleaning to network security. Given the fundamental nature of the task, this has been the subject of much research. Recently, a new class of outlier detection algorithms has emerged, called {\it contextual outlier detection}, and has shown improved performance when studying anomalous behavior in a specific context. However, as we point out in this article, such approaches have limited applicability in situations where the context is sparse (i.e. lacking a suitable frame of reference). Moreover, approaches developed to date do not scale to large datasets. To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). We utilize a local and global behavioral model based on the relevant contexts, which is then integrated in a natural and robust fashion. We also present several optimizations to improve the scalability of the approach. We run ROCOD on both synthetic and real-world datasets and demonstrate that it outperforms other competitive baselines on the axes of efficacy and efficiency (40X speedup compared to modern contextual outlier detection methods). We also drill down and perform a fine-grained analysis to shed light on the rationale for the performance gains of ROCOD and reveal its effectiveness when handling objects with sparse contexts.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1607.08329

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Beginners Tutorial on XGBoost and Parameter Tuning in R

#artificialintelligenceDec-21-2016, 16:40:13 GMT

Last week, we learned about Random Forest Algorithm. Now we know it helps us reduce a model's variance by building models on resampled data and thereby increases its generalization capability.

artificial intelligence, classification, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

50 Questions to Test True Data Science Knowledge

@machinelearnbotDec-21-2016, 00:35:04 GMT

Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods, such as ridge regression and LASSO? Explain what a local optimum is and why it is important in a specific context, such as k-means clustering. What are specific ways for determining if you have a local optimum problem? What can be done to avoid local optima?

artificial intelligence, false negative, machine learning, (15 more...)

@machinelearnbot

Genre: Research Report (0.39)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.83)

Add feedback

4 trends in security data science

#artificialintelligenceDec-20-2016, 12:05:20 GMT

In 2015, we saw graphs dominate security data science. Graphs permeated all areas--everything from visualizations to graphical inference. It's quite easy to write about security trends for 2016--the hard part is trying to interpret what the trends could potentially mean to organizations on a day-to-day basis. This article is not the wishlist of a deluded security data scientist. Rather, these are strategic trends that you can expect to see in the field, mixed with tactical steps to capitalize on them.

artificial intelligence, detection, machine learning, (16 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Student and Faculty Guide – 10 easy steps to get up and running with Azure Machine Learning

#artificialintelligenceDec-18-2016, 18:00:37 GMT

My colleague Amy Nicholson is the UK expert on Azure Machine Learning, the following blog post is after a quizzing session to get understand how to get started with Azure Machine Learning" Each student receives $100 of Azure credit per month, for 6 months. The Faculty member receives $250 per month, for 12 months. The Azure machine learning team provided a very nice walkthrough tutorial which covers a lot of the basics. This tutorial is really useful as it takes you through the entire process of creating an AzureML workspace, uploading data, creating an experiment to predict someone's credit risk, building, training, and evaluating the models, publishing your best model as a web service, and calling that web service. Now you need to learn how to import a data set into Azure Machine Learning, and where to find interesting data to build something amazing.

artificial intelligence, azure machine learning, machine learning, (16 more...)

#artificialintelligence

Country: Europe > United Kingdom (0.25)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (1.00)
Banking & Finance > Credit (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

109 Commonly Asked Data Science Interview Questions

#artificialintelligenceDec-18-2016, 14:35:33 GMT

What is the Central Limit Theorem and why is it important? How many sampling methods do you know? What is the difference between Type I vs Type II error? What do the terms P-value, coefficient, R-Squared value mean? What is the significance of each of these components? What are the assumptions required for linear regression? There are four major assumptions: 1. There is a linear relationship between the variables, meaning the model you are creating actually fits the data, 2. The errors or residuals of the data are normally distributed and independent from each other, 3. There is minimal multicollinearity between explanatory variables, and 4. Homoscedasticity. This means the variance around the regression line is the same for all values of the predictor variable. What is an example of a dataset with a non-Gaussian distribution?

artificial intelligence, data mining, machine learning, (17 more...)

#artificialintelligence

Country: North America > United States (0.04)

Genre: Personal > Interview (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.30)

Add feedback

Weekly Digest, December 19

@machinelearnbotDec-18-2016, 03:45:02 GMT

Data Science for IoT vs Classic Data Science: 10 Differences Enterprise AI insights from the AI Europe event in London Is it time to consider data in motion in your big data projects?

artificial intelligence, data mining, machine learning, (15 more...)

@machinelearnbot

Country: Europe (0.30)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Optimal tuning for divide-and-conquer kernel ridge regression with massive data

Xu, Ganggang, Shang, Zuofeng, Cheng, Guang

arXiv.org Machine LearningDec-18-2016

We propose a first data-driven tuning procedure for divide-and-conquer kernel ridge regression (Zhang et al., 2015). While the proposed criterion is computationally scalable for massive data sets, it is also shown to be asymptotically optimal under mild conditions. The effectiveness of our method is illustrated by extensive simulations and an application to Million Song Dataset. Some key words:Distributed GCV, divide-and-conquer, kernel ridge regression, optimal tuning.

artificial intelligence, dgcv, machine learning, (18 more...)

arXiv.org Machine Learning

1612.05907

Country: North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.82)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.82)

Add feedback