Deep learning algorithms demand nearly limitless supplies of data

#artificialintelligence 

In any deep learning project, it's almost impossible to imagine an upper limit on the amount of data needed for training models and conducting analyses. "We need to get more data," said Patrick Lucey, director of data science at sports consulting company STATS LLC in Chicago. We want to reconstruct that story, [and] tell better stories, and we're limited because we can't get all the data we want." Deep learning, as defined by the use of multiple machine learning algorithms, such as neural networks strung together, isn't necessarily a new concept. However, it started to gain more widespread traction last year, as researchers and enterprises realized that analytical models could be turned loose on the massive troves of data businesses had accumulated since the dawn of the big data era. Deep learning algorithms require experience to sharpen their recommendations, and big data provides them with exactly the fuel they need. But this raises the question of when is enough data enough? Some of the most prominent deep learning examples used hundreds of thousands, even millions of records during the model training process. At STATS, Lucey has access to ample data, but said he still feels models could function better with more. The company maintains databases of game data going back to its beginnings in 1981. Its deepest data sets go back to 2010 with the NBA, and come from its SportVU system, a network of cameras installed at sports arenas that captures player movement data. This wealth of data has enabled Lucey and his team to do some interesting things with deep learning. For example, he and his team developed a model that looks at video data from NBA games and analyzes players' body positions to better define what an open shot looks like. Another STATS project applied deep learning algorithms to English Premier League soccer. STATS analyzed data beyond traditional statistics, like shots and goals, to understand the factors that led to longshot Leicester City Football Club taking home the title in the league's 2015-2016 season, which ended last May. The data science team at STATS primarily builds models in open source tools, such as the Google-created TensorFlow and scikit-learn, a library of machine learning models built in Python. These projects have been successful, according to Lucey. However, he added that he's already looking to sharpen analyses, and he thinks more data will help. In addition to larger data volumes, more detailed information will be necessary, he noted. Deep learning algorithms thrive on detailed data as much as large amounts of data, and that will play an important role as these models continue to improve and describe the world more accurately. "That's the key -- finding that context," Lucey said. "You can get a good prediction, but if it's washed over by context, it's not as valuable.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found