AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Sampling Method for Fast Training of Support Vector Data Description

Chaudhuri, Arin, Kakde, Deovrat, Jahja, Maria, Xiao, Wei, Jiang, Hansi, Kong, Seunghyun, Peredriy, Sergiy

arXiv.org Machine LearningSep-25-2016

Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data process-monitoring applications. We propose a new iterative sampling-based method for SVDD training. The method incrementally learns the training data description at each iteration by computing SVDD on an independent random sample selected with replacement from the training data set. The experimental results indicate that the proposed method is extremely fast and provides a good data description .

artificial intelligence, machine learning, training data, (14 more...)

arXiv.org Machine Learning

1606.05382

Country: North America > United States (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Google's Jigsaw subsidiary is building open-source AI tools to spot trolls

#artificialintelligenceSep-23-2016, 01:12:07 GMT

Can Google bring peace to the web with machine learning? Jigsaw, a subsidiary of parent company Alphabet is certainly trying, building open-source AI tools designed to filter out abusive language. A new feature from Wired describes how the software has been trained on some 17 million comments left underneath New York Times stories, along with 13,000 discussions on Wikipedia pages. This data is labeled and then fed into the software -- called Conversation AI -- which begins to learn what bad comments look like. According to the report, Google says Conversation AI can identify abuse with "more than 92 percent certainty and a 10 percent false-positive rate" when compared to the judgements of a human panel.

artificial intelligence, conversation ai, machine learning, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

How To Stop Online Harassment: Google Uses Machine Learning Tools To More Accurately Spot Abusive Content

International Business TimesSep-21-2016, 21:00:44 GMT

A subsidiary of Google's parent company Alphabet, Jigsaw, is using machine learning to fend off online trolling, reports Wired. The New York–based think tank is building open-source AI tools, collectively called Conversation AI, to filter out harassment and abusive language. "Few things poison conversations online more than abusive language, threats, and harassment," reads the Conversation AI website. "We're studying how computers can learn to understand the nuances and context of abusive language at scale. If successful, machine learning could help publishers and moderators improve comments on their platforms and enhance the exchange of ideas on the internet."

abusive language, artificial intelligence, conversation ai, (6 more...)

International Business Times

Country: North America > United States > New York (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

MLDB Blog

#artificialintelligenceSep-21-2016, 12:17:07 GMT

The business world is full of streams of items that need to be filtered or evaluated: parts on an assembly line, resumés in an application pile, emails in a delivery queue, transactions awaiting processing. Machine learning techniques are increasingly being used to make such processes more efficient: image processing to flag bad parts, text analysis to surface good candidates, spam filtering to sort email, fraud detection to lower transaction costs etc. In this article, I show how you can take business factors into account when using machine learning to solve these kinds of problems with binary classifiers. Specifically, I show how the concept of expected utility from the field of economics maps onto the Receiver Operating Characteristic (ROC) space often used by machine learning practitioners to compare and evaluate models for binary classification. I begin with a parable illustrating the dangers of not taking such factors into account. This concrete story is followed by a more formal mathematical look at the use of indifference curves in ROC space to avoid this kind of problem and guide model development. I wrap up with some recommendations for successfully using binary classifiers to solve business problems.

artificial intelligence, classifier, machine learning, (19 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

How to increase Naive Bayes accuracy? • /r/MachineLearning

@machinelearnbotSep-20-2016, 14:00:19 GMT

How to increase Naive Bayes accuracy? Total size of dataset was 81. Ok so I ran the program against test data and it gave me accuracy of 21% only. Can anyone tell me why is like that? Where am I going wrong?

artificial intelligence, increase naive baye accuracy, machine learning, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

How to make Training Data for Naive Bayes? • /r/MachineLearning

@machinelearnbotSep-20-2016, 07:05:22 GMT

I am learning NB algorithm and implementing on a real dataset that contains only 80 records. Now I want to prepare training data. I want to know whether training data is made from the actual data or the actual pattern given in real data? Also, does training data means covering all cases given in real data or what?

artificial intelligence, machine learning, make training data, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Predictive modelling of football injuries

Kampakis, Stylianos

arXiv.org Machine LearningSep-20-2016

The goal of this thesis is to investigate the potential of predictive modelling for football injuries. This work was conducted in close collaboration with Tottenham Hotspurs FC (THFC), the PGA European tour and the participation of Wolverhampton Wanderers (WW). Three investigations were conducted: 1. Predicting the recovery time of football injuries using the UEFA injury recordings: The UEFA recordings is a common standard for recording injuries in professional football. For this investigation, three datasets of UEFA injury recordings were available. Different machine learning algorithms were used in order to build a predictive model. The performance of the machine learning models is then improved by using feature selection conducted through correlation-based subset feature selection and random forests. 2. Predicting injuries in professional football using exposure records: The relationship between exposure (in training hours and match hours) in professional football athletes and injury incidence was studied. A common problem in football is understanding how the training schedule of an athlete can affect the chance of him getting injured. The task was to predict the number of days a player can train before he gets injured. 3. Predicting intrinsic injury incidence using in-training GPS measurements: A significant percentage of football injuries can be attributed to overtraining and fatigue. GPS data collected during training sessions might provide indicators of fatigue, or might be used to detect very intense training sessions which can lead to overtraining. This research used GPS data gathered during training sessions of the first team of THFC, in order to predict whether an injury would take place during a week.

artificial intelligence, gaussian process polynomial kernel, machine learning, (17 more...)

arXiv.org Machine Learning

1609.0748

Country:

Europe (1.00)
North America > United States (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
(2 more...)

Add feedback

Conformalized Kernel Ridge Regression

Burnaev, Evgeny, Nazarov, Ivan

arXiv.org Machine LearningSep-19-2016

General predictive models do not provide a measure of confidence in predictions without Bayesian assumptions. A way to circumvent potential restrictions is to use conformal methods for constructing non-parametric confidence regions, that offer guarantees regarding validity. In this paper we provide a detailed description of a computationally efficient conformal procedure for Kernel Ridge Regression (KRR), and conduct a comparative numerical study to see how well conformal regions perform against the Bayesian confidence sets. The results suggest that conformalized KRR can yield predictive confidence regions with specified coverage rate, which is essential in constructing anomaly detection systems based on predictive models.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1609.05959

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.74)

Add feedback

impact to AUC if swap positive and negative during model training

#artificialintelligenceSep-18-2016, 08:40:16 GMT

If I swap positive class and negative class, then train a model again (I tried decision tree, adaboost, svm from scikit-learn built-in package) for a two class classification problem. Sometimes, I can see AUC slightly change (around 1-2%). Anyone have any ideas why there are such changes? For ROC curve, x-axis is false positive rate, and y-axis and true positive rate. When prediction model gives prediction scores, we will order the scores from higher value to lower value, and then choose threshold according to the sorted values and calculate at the specific threshold point, what is the fpr and tpr.

artificial intelligence, machine learning, model training, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Detecting weak changes in dynamic events over networks

Li, Shuang, Xie, Yao, Farajtabar, Mehrdad, Verma, Apurv, Song, Le

arXiv.org Machine LearningSep-16-2016

Large volume of networked streaming event data are becoming increasingly available in a wide variety of applications, such as social network analysis, Internet traffic monitoring and healthcare analytics. Streaming event data are discrete observation occurred in continuous time, and the precise time interval between two events carries a great deal of information about the dynamics of the underlying systems. How to promptly detect changes in these dynamic systems using these streaming event data? In this paper, we propose a novel change-point detection framework for multi-dimensional event data over networks. We cast the problem into sequential hypothesis test, and derive the likelihood ratios for point processes, which are computed efficiently via an EM-like algorithm that is parameter-free and can be computed in a distributed fashion. We derive a highly accurate theoretical characterization of the false-alarm-rate, and show that it can achieve weak signal detection by aggregating local statistics over time and networks. Finally, we demonstrate the good performance of our algorithm on numerical examples and real-world datasets from twitter and Memetracker.

data mining, hawke process, machine learning, (19 more...)

arXiv.org Machine Learning

1603.08981

Country:

North America > United States (1.00)
Asia (0.67)

Genre: Research Report (0.63)

Industry:

Media (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
(2 more...)

Add feedback