kaggle


Dogs vs. Cats Redux Playground Competition, Winner's Interview: Bojan Tunguz

#artificialintelligence

The Dogs versus Cats Redux: Kernels Edition playground competition revived one of our favorite "for fun" image classification challenges from 2013, Dogs versus Cats. This time Kaggle brought Kernels, the best way to share and learn from code, to the table while competitors tackled the problem with a refreshed arsenal including TensorFlow and a few years of deep learning advancements. In this winner's interview, Kaggler Bojan Tunguz shares his 4th place approach based on deep convolutional neural networks and model blending. I am a Theoretical Physicist by training, and have worked in Academia for many years. A few years ago I came across some really cool online machine learning courses, and fell in love with that field.



Best Data Sources For Building Data Science Models

#artificialintelligence

On Everyday, every second huge amount of data is being generated on the Internet. As per IBM report, about 2.5 quintillion bytes of data is generated every day. Technology has made it visible to everyone and assists how to learn from it. Now, Data has become a subject for academics where deep research and analysis took place to give rise to the facts and patterns. In this digital world, every innovation at the moment is built on data-centric.


Motivating the Greatest Geniuses in AI to Change the World Instead of Destroy It

#artificialintelligence

"The best minds of my generation are thinking about how to make people click ads. That sucks," said data scientist Jeffrey Hammerbacher, founder of Cloudera. What else are many of the top AI folks working on? Instead of solving world hunger or cleaning up the ocean or curing cancer, they're working on killing people and getting people to buy crap they don't really want or need. Sure, the absolute best of the best in the field have the creative freedom to tackle whatever they want but those folks are few and far between. There are only so many pure research positions. A company or college has to achieve incredible success before they have enough money to bet on long term projects that may never work out. Google is one of those companies. The University of Toronto kept the tiny field of neural networks alive for decades when it looked like it might never solve a real world problem. There are others but not many. The fact is to fund real, civilization changing research you need surplus money. And surplus money doesn't come easy.


Machines are coming for your March Madness office pool

#artificialintelligence

March Madness--the NCAA college basketball championship playoffs--is among the most popular sporting events in the US, thanks in part to the wide-ranging contest that has evolved around predicting which teams will progress through the tournament. This year, almost $10.4 million is on the line in office pools or more organized competitions, and more than 40 million Americans will fill out their own versions of the playoff brackets to take part, according to the American Gaming Association. The chances of predicting a perfect bracket, which no one has ever done, are at least 1 in 128 billion and could be as remote as 1 in 9.2 quintillion. Now machine learning is taking a shot. Kaggle, the online platform for predictive modeling and analytics competitions that was acquired by Google parent company Alphabet last year, is hosting a competition for both the NCAA men's and women's tournaments.


Machine Learning Madness: Predicting Every NCAA Tournament Matchup

@machinelearnbot

March Madness is here, that means it's time to fill out your brackets and promptly be disappointed when it's inevitably busted for all your friends and family to see. The NCAA Division I Men's Basketball Tournament is one of the most exciting sporting events of the year thanks to the upsets, cinderellas, and unpredictability that come with it. Millions of basketball fans and non-basketball fans alike grab their virtual sharpies every March to participate in this phenomenon with hopes of achieving an improbable perfect bracket. Given you know a little about basketball, professors put your odds conservatively at 1 in 128 billion. Thanks to my upbringing and an excess of Sportscenter growing up, I've been an avid college basketball fan for as long as I can remember.


Kaggle Tensorflow Speech Recognition Challenge – Towards Data Science

#artificialintelligence

The training data supplied by Google Brain consists of ca. Only 10 of these are classes you need to identify, the others should go in the'unknown' or'silence' classes. There are a couple of things you can do to get a grip on the data you're working with. This data set is not completely cleaned up for you. For example, some files are not exactly 1 second long.


We just released 3 years of freeCodeCamp chat history as Open Data -- all 5 million messages of it

@machinelearnbot

Two years ago, our nonprofit started a tradition of releasing large public datasets for researchers and data scientists. And today I'm thrilled to announce the release of our biggest open dataset yet. Gitter.im is an open source chat platform designed specifically with open source in mind. Unlike Slack or Discord, Gitter is truly public. Anyone can join a chatroom, and anyone can observe a chatroom without even needing to create a Gitter account.


Kaggle Tensorflow Speech Recognition Challenge

#artificialintelligence

In November of 2017 the Google Brain team hosted a speech recognition challenge on Kaggle. The goal of this challenge was to write a program that can correctly identify one of 10 words being spoken in a one-second long audio file. Having just made up my mind to start seriously studying data science with the goal of turning a new corner in my career, I decided to tackle this as my first serious kaggle challenge. In this post I will talk about ResNets, RNNs, 1D and 2D convolution, Connectionist Temporal Classification and more. Let's go! Exploratory Data Analysis The training data supplied by Google Brain consists of ca. Only 10 of these are classes you need to identify, the others should go in the'unknown' or'silence' classes. There are a couple of things you can do to get a grip on the data you're working with. This data set is not completely cleaned up for you. For example, some files are not exactly 1 second long. And there are no'silence' files as such.


Perfect way to build a Predictive Model in less than 10 minutes

@machinelearnbot

In the last few months, we have started conducting data science hackathons. These hackathons are contests with a well defined data problem, which has be be solved in short time frame. They typically last any where between 2 – 7 days.