kaggle


How to Compete for Zillow Prize at Kaggle

@machinelearnbot

It's a crowd-sourced platform to attract, nurture, train and challenge data scientists from all around the world to solve data science and predictive analytics problems through machine learning. With 73 million unique visitors per month, 20 TBs of data and 1.2 million statistical and machine learning models that runs every night to predict the next Zestimates, it is undoubtedly the best machine learning case study for real estate under the sun. While, million dollar seems like a big prize, it's the cost of having 10 data science engineers in Silicon Valley for eight months for 100,000$ a piece, whereas, to-date there are 2900 teams participating and competing for this prize from all around the world, with a typical size of three members per team, 8700 individuals it is just 114$ per engineer, which is equivalent to 14$ per month or 1.7$ per hour per data scientist. To submit your first kernel, you can fork my public kernel – how to compete for Zillow prize – first kernel and run it.


Python Training Python For Data Science Learn Python

@machinelearnbot

This path provides a comprehensive overview of steps you need to learn to use Python for data analysis. The free interactive Python tutorial by DataCamp is one of the best places to start your journey. Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. In case you need to use Big Data libraries, give Pydoop and PyMongo a try.


Using Machine Learning to Predict Epileptic Seizures from EEG Data - MATLAB & Simulink

#artificialintelligence

The algorithms I developed in MATLAB scored highest among individual participants and third highest in the competition overall. In this study, intracranial EEG recordings were collected from 15 epileptic patients via 16 surgically implanted electrodes sampled at 400 Hz for several months. Kaggle competition participants received almost 100 gigabytes of EEG data from three of the test subjects. Each ten-minute-long segment contained either preictal data, recorded before a seizure, or interictal data, recorded during a long period in which no seizures occurred.


Kaggle instacart (top2%) feature engineering and solution overview - Jacques Peeters's blog

@machinelearnbot

This blog post aims at showing what kind of feature engineering can be achieved in order to improve machine learning models. In this blog post i'll detail my general approach (in a machine learning way) and the feature engineering work which was done. Feature engineering is the oil allowing machine learning models to shine. In my opinion feature engineering and data wrangling is more important than models!


Stacking Models for Improved Predictions

@machinelearnbot

I will use three different regression methods to create predictions (XGBoost, Neural Networks, and Support Vector Regression) and stack them up to produce a final prediction. I trained three level 1 models: XGBoost, neural network, support vector regression. Graphically, once can see that the circled data point is a prediction which is worse in XGBoost (which is the best model when trained on all the training data), but neural network and support vector regression does better for that specific point. For example, below are the RMSE values on the holdout data (rmse1: XGBoost, rmse2: Neural Network, rmse3: Support Vector Regression), for 20 different random 10-folds created.


The machine learning problem of the next decade

@machinelearnbot

A few months ago, my company, CrowdFlower, ran a machine learning competition on Kaggle. Until we're replaced by robots, this is going to be the machine learning challenge of the next decade. But that's still on the order of 30,000 miles between potential crashes, while human drivers go on the order of 1 million miles between potential crashes and 100 million miles between fatal crashes. Companies no longer need a Google-size R&D budget to make machine learning applicable to their business.


How to win Kaggle competition based on NLP task not being NLP expert

@machinelearnbot

Right after the start of the Kaggle competition participants started sharing interesting findings about the data set. It is very important to know in advance in case the duplicates' distribution is different in test and training data sets since the quality metric used in this solution is very sensitive to those distribution changes. Let's imagine the data set contains only seven records: Now we can calculate the number of "common neighbours" for every question pair from the data set. Modern deep learning models are represented by deep neural networks that get raw data as an input (questions' texts) and produce the necessary features themselves.


Pseudo-labeling a simple semi-supervised learning method - Data, what now?

@machinelearnbot

In this post, I will show how a simple semi-supervised learning method called pseudo-labeling that can increase the performance of your favorite machine learning models by utilizing unlabeled data. First, train the model on labeled data, then use the trained model to predict labels on the unlabeled data, thus creating pseudo-labels. In competitions, such as ones found on Kaggle, the competitor receives the training set (labeled data) and test set (unlabeled data). Pseudo-labeling allows us to utilize unlabeled data while training machine learning models.


Elon Musk warns battle for AI supremacy will spark Third World War

The Independent

Other Russian AI startups are only known to experts, and while large Russian information technology companies such as Yandex and Mail.ru Group have invested a lot of resources in AI research and built products using neural networks (Yandex search, more popular than Google in Russia, is powered by proprietary neural tech), these achievements are overshadowed by those of bigger Western rivals. The first loan the BRICS Development Bank -- a financial institution set up jointly by Brazil, Russia, India, China and South Africa -- has approved for Russia is meant to fund a project that includes the use of AI in Russian courts to automate trial records using speech recognition. It's likely that, as in Soviet times, the military applications of AI in Russia are outpacing consumer ones. Last month's call by Musk and a group of AI researchers for a global ban on robotic weapons is timely but probably unworkable: The use of ostensibly conventional but actually autonomous weaponry is far more difficult to detect, making a prohibition on them harder to enforce than existing bans on chemical and biological weapons, or even than restrictions on various forms of cyberwarfare would have been.


Take Elon Musk Seriously on the Russian AI Threat

#artificialintelligence

It's just that its progress in the field has been somewhat below the radar: We are used to discussing AI in the context of major Silicon Valley companies' or top U.S. universities' advances, and while Russians work there, the top names are not Russian. Other Russian AI startups are only known to experts, and while large Russian information technology companies such as Yandex and Mail.ru Group have invested a lot of resources in AI research and built products using neural networks (Yandex search, more popular than Google in Russia, is powered by proprietary neural tech), these achievements are overshadowed by those of bigger Western rivals. It's likely that, as in Soviet times, the military applications of AI in Russia are outpacing consumer ones. Last month's call by Musk and a group of AI researchers for a global ban on robotic weapons is timely but probably unworkable: The use of ostensibly conventional but actually autonomous weaponry is far more difficult to detect, making a prohibition on them harder to enforce than existing bans on chemical and biological weapons, or even than restrictions on various forms of cyberwarfare would have been.