AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

A Popular Crime-Predicting Algorithms Performed Worse Than Mechanical Turks in One Study

WIREDJan-17-2018, 19:43:15 GMT

The American criminal justice system couldn't get much less fair. Across the country, some 1.5 million people are locked up in state and federal prisons. More than 600,000 people, the vast majority of whom have yet to be convicted of a crime, sit behind bars in local jails. Black people make up 40 percent of those incarcerated, despite accounting for just 13 percent of the US population. With the size and cost of jails and prisons rising--not to mention the inherent injustice of the system--cities and states across the country have been lured by tech tools that promise to predict whether someone might commit a crime.

artificial intelligence, machine learning, social media, (13 more...)

WIRED

Country: North America > United States (1.00)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.91)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.33)

Add feedback

McDiarmid Drift Detection Methods for Evolving Data Streams

Pesaranghader, Ali, Viktor, Herna, Paquet, Eric

arXiv.org Machine LearningJan-17-2018

Increasingly, Internet of Things (IoT) domains, such as sensor networks, smart cities, and social networks, generate vast amounts of data. Such data are not only unbounded and rapidly evolving. Rather, the content thereof dynamically evolves over time, often in unforeseen ways. These variations are due to so-called concept drifts, caused by changes in the underlying data generation mechanisms. In a classification setting, concept drift causes the previously learned models to become inaccurate, unsafe and even unusable. Accordingly, concept drifts need to be detected, and handled, as soon as possible. In medical applications and emergency response settings, for example, change in behaviours should be detected in near real-time, to avoid potential loss of life. To this end, we introduce the McDiarmid Drift Detection Method (MDDM), which utilizes McDiarmid's inequality in order to detect concept drift. The MDDM approach proceeds by sliding a window over prediction results, and associate window entries with weights. Higher weights are assigned to the most recent entries, in order to emphasize their importance. As instances are processed, the detection algorithm compares a weighted mean of elements inside the sliding window with the maximum weighted mean observed so far. A significant difference between the two weighted means, upper-bounded by the McDiarmid inequality, implies a concept drift. Our extensive experimentation against synthetic and real-world data streams show that our novel method outperforms the state-of-the-art. Specifically, MDDM yields shorter detection delays as well as lower false negative rates, while maintaining high classification accuracies.

artificial intelligence, concept drift, machine learning, (12 more...)

arXiv.org Machine Learning

1710.0203

Country:

North America > Canada (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Smart Houses & Appliances (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

Predicting Movie Genres Based on Plot Summaries

Hoang, Quan

arXiv.org Machine LearningJan-15-2018

This project explores several Machine Learning methods to predict movie genres based on plot summaries. Naive Bayes, Word2Vec+XGBoost and Recurrent Neural Networks are used for text classification, while K-binary transformation, rank method and probabilistic classification with learned probability threshold are employed for the multi-label problem involved in the genre tagging task.Experiments with more than 250,000 movies show that employing the Gated Recurrent Units (GRU) neural networks for the probabilistic classification with learned probability threshold approach achieves the best result on the test set. The model attains a Jaccard Index of 50.0%, a F-score of 0.56, and a hit rate of 80.5%.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

1801.04813

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

tau-FPL: Tolerance-Constrained Learning in Linear Time

Zhang, Ao, Li, Nan, Pu, Jian, Wang, Jun, Yan, Junchi, Zha, Hongyuan

arXiv.org Machine LearningJan-15-2018

Learning a classifier with control on the false-positive rate plays a critical role in many machine learning applications. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classifiers, which lack consistency in methodology because they do not strictly adhere to the false-positive rate constraint. In this paper, we propose a novel scoring-thresholding approach, tau-False Positive Learning (tau-FPL) to address this problem. We show the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed tau-FPL over existing approaches.

artificial intelligence, machine learning, tolerance-constrained learning, (16 more...)

arXiv.org Machine Learning

1801.04701

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Optimal Generalized Decision Trees via Integer Programming

Gunluk, Oktay, Kalagnanam, Jayant, Menickelly, Matt, Scheinberg, Katya

arXiv.org Machine LearningJan-14-2018

Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees of specified size. We take special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of such a feature) at each node. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are easily tractable with modern solvers.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Machine Learning

1612.03225

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

How machine learning engineers can detect and debug algorithmic bias

#artificialintelligenceJan-13-2018, 13:45:54 GMT

Ben Lorica, O'Reilly's chief data scientist, has posted slides and notes from his talk at last December's Strata Data Conference in Singapore, "We need to build machine learning tools to augment machine learning engineers." Lorica describes a new job emerging in IT departments: "machine learning engineers," whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce. He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It's really required reading for anyone implementing a machine learning system in a production environment.

artificial intelligence, engineer, machine learning, (7 more...)

#artificialintelligence

Country: Asia > Singapore (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Introduction to Python Ensembles

#artificialintelligenceJan-13-2018, 13:37:26 GMT

Ensembles have rapidly become one of the hottest and most popular methods in applied machine learning. Virtually every winning Kaggle solution features them, and many data science pipelines have ensembles in them. Put simply, ensembles combine predictions from different models to generate a final prediction, and the more models we include the better it performs. Better still, because ensembles combine baseline predictions, they perform at least as well as the best baseline model. Ensembles give us a performance boost almost for free! An input array $X$ is fed through two preprocessing pipelines and then to a set of base learners $f {(i)}$. The ensemble combines all base learner predictions into a final prediction array $P$. In this post, we'll take you through the basics of ensembles -- what they are and why they work so well -- and provide a hands-on tutorial for building basic ensembles. To illustrate how ensembles work, we'll use a data set on U.S. political contributions.

artificial intelligence, ensemble, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification

Rayhan, Farshid, Ahmed, Sajid, Mahbub, Asif, Jani, Md. Rafsan, Shatabda, Swakkhar, Farid, Dewan Md., Rahman, Chowdhury Mofizur

arXiv.org Machine LearningJan-13-2018

Class imbalance problem has been a challenging research problem in the fields of machine learning and data mining as most real life datasets are imbalanced. Several existing machine learning algorithms try to maximize the accuracy classification by correctly identifying majority class samples while ignoring the minority class. However, the concept of the minority class instances usually represents a higher interest than the majority class. Recently, several cost sensitive methods, ensemble models and sampling techniques have been used in literature in order to classify imbalance datasets. In this paper, we propose MEBoost, a new boosting algorithm for imbalanced datasets. MEBoost mixes two different weak learners with boosting to improve the performance on imbalanced datasets. MEBoost is an alternative to the existing techniques such as SMOTEBoost, RUSBoost, Adaboost, etc. The performance of MEBoost has been evaluated on 12 benchmark imbalanced datasets with state of the art ensemble methods like SMOTEBoost, RUSBoost, Easy Ensemble, EUSBoost, DataBoost. Experimental results show significant improvement over the other methods and it can be concluded that MEBoost is an effective and promising algorithm to deal with imbalance datasets. The python version of the code is available here: https://github.com/farshidrayhanuiu/

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Machine Learning

1712.06658

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

We need to build machine learning tools to augment machine learning engineers

#artificialintelligenceJan-12-2018, 02:28:03 GMT

Check out the machine learning sessions at the Strata Data Conference in London, May 21-24, 2018. Hurry--best price ends February 23. In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products infused with machine learning capabilities. Over the past few years, the data community has focused on infrastructure and platforms for data collection, including robust pipelines and highly scalable storage systems for analytics. According to a recent LinkedIn report, the top two emerging jobs are "machine learning engineer" and "data scientist."

artificial intelligence, engineer, machine learning, (15 more...)

#artificialintelligence

Country: Asia > Singapore (0.25)

Industry: Information Technology > Security & Privacy (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Add feedback