Collaborating Authors

Machine Learning Datasets in R (10 datasets you can use right now) - Machine Learning Mastery


You need standard datasets to practice machine learning. In this short post you will discover how you can load standard classification and regression datasets in R. This post will show you 3 R libraries that you can use to load standard datasets and 10 specific datasets that you can use for machine learning in R. It is invaluable to load standard datasets in R so that you can test, practice and experiment with machine learning techniques and improve your skill with the platform. There are hundreds of standard test datasets that you can use to practice and get better at machine learning. Most of them are hosted for free on the UCI Machine Learning Repository.

Train a model on fashion dataset


Fashion MNIST is a direct drop-in replacement for the original MNIST dataset. The dataset is made up of 60,000 training examples and 10,000 testing examples, where each example is a 28 28 grayscaled picture of various articles of clothing. The Fashion MNIST dataset is more difficult than the original MNIST, and thus serves as a more complete benchmarking tool. The model being trained is a CNN with three convolutional layers followed by two dense layers. The job will run for 30 epochs, with a batch size of 128.

Machine Learning Datasets: 250+ ML Repository Of Speech Datasets


While open data or public data sets are convenient, we offer an extensive catalog of'off-the-shelf', 250 licensable datasets across 80 languages across multiple dialects for a variety of common AI use cases. We are excited to announce 30 new datasets for 2020 that deliver immediate value to our customers. Among our offerings, you will find data sets for speech recognition, learning datasets for machine learning algorithms, all created with the most advanced available data science. Whether you are working on a text-to-speech system, a voice recognition system or another solution that relies on natural language, high-quality licensed speech and language datasets allow you to go to market faster and reach more potential customers. Should You Build or Buy a Data Annotation Tool?

Working with NLP datasets in Python


In the field of Deep Learning, datasets are an essential part of every project. To train a neural network that can handle new situations, one has to use a dataset that represents the upcoming scenarios of the world. An image classification model trained on animal images will not perform well on a car classification task. Alongside training the best models, researchers use public datasets as a benchmark of their model performance. I personally think that easy-to-use public benchmarks are one of the most useful tools to help facilitate the research process.

Machine Learning -- 2018 World Cup Simulation – Rodrigo Nader – Medium


The World Cup is reaching a new stage and few were those who could anticipate the group stage outcomes. Now it's time for a much more thrilling phase, where the greatest of the world will face each other. The goal of this article is, using the power of data science with Python, try to uncover some of the statistics those games will present. Some libraries we're going to use: The idea here is to make a machine learning algorithm to predict the winner of a single match, and from there, build a monte carlo simulation that could infer the odds of each knockout game winner, and subsequently, the probability for the world's champion. This article will present some graphs and codes, but feel free to skip it if you'd like, I'll try to make it as intuitive as possible.