Big Data: Overviews


Google's Vision for Mainstreaming Machine Learning

#artificialintelligence

Here at The Next Platform, we've touched on the convergence of machine learning, HPC, and enterprise requirements looking at ways that vendors are trying to reduce the barriers to enable enterprises to leverage AI and machine learning to better address the rapid changes brought about by such emerging trends as the cloud, edge computing and mobility. At the SC17 show in November 2017, Dell EMC unveiled efforts underway to bring AI, machine learning and deep learning into the mainstream, similar to how the company and other vendors in recent years have been working to make it easier for enterprises to adopt HPC techniques for their environments. For Dell EMC, that means in part doing so through bundled, engineered systems. IBM has strategies underway, including through the integration of its PowerAI deep learning enterprise software with its Data Science Experience. Both offerings are aimed at making it easier for enterprises to embrace advance AI technologies and for developers and data scientists to develop and train machine learning models.


A Framework for Approaching Textual Data Science Tasks

@machinelearnbot

There's an awful lot of text data available today, and enormous amounts of it are being created on a daily basis, ranging from structured to semi-structured to fully unstructured. What can we do with it? Well, quite a bit, actually; it depends on what your objectives are, but there are 2 intricately related yet differentiated umbrellas of tasks which can be exploited in order to leverage the availability of all of this data. NLP is a major aspect of computational linguistics, and also falls within the realms of computer science and artificial intelligence. Text mining exists in a similar realm as NLP, in that it is concerned with identifying interesting, non-trivial patterns in textual data.


Top Stories, Jan 1-7: Docker for Data Science; Quantum Machine Learning: An Overview

@machinelearnbot

Docker for Data Science, by Sachin Abeywardana Top 10 Machine Learning Algorithms for Beginners, by Reena Shaw How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science? How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science? Top Stories, Dec 18-31: How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science?; Computer Vision by Andrew Ng – 11 Lessons Learned - Jan 03, 2018. How to build a Successful Advanced Analytics Department - Jan 04, 2018. Top Stories, Dec 18-31: How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science?; Computer Vision by Andrew Ng – 11 Lessons Learned - Jan 03, 2018.


Top Stories, Jan 1-7: Docker for Data Science; Quantum Machine Learning: An Overview

@machinelearnbot

Docker for Data Science Top 10 Machine Learning Algorithms for Beginners How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science? How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science? Top Stories, Dec 18-31: How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science?; Computer Vision by Andrew Ng – 11 Lessons Learned - Jan 03, 2018. How to build a Successful Advanced Analytics Department - Jan 04, 2018. Top Stories, Dec 18-31: How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science?; Computer Vision by Andrew Ng – 11 Lessons Learned - Jan 03, 2018.


Data Can Lie–Here's A Guide To Calling Out B.S.

@machinelearnbot

According to the University of Washington professors Carl T. Bergstrom and Jevin West, it's time someone did something about it. It's a free structured course of readings and case studies aimed at giving students (and anyone who might be interested) the tools to look critically at scientific claims driven by data and machine learning. Over the past six months, the two scientists created the syllabus and published it online in the hopes that the UW administration would take notice and turn it into a real class (it's currently winding its way through the approval process, and might be offered as soon as the spring). The two have been frustrated with the way statistical findings are treated in the media and in the classroom for years. West, a professor in the Information School and the director of UW's Data Lab, believes that thanks to the emergence of big data and the increasing availability of tools that help more people work with it, the amount of bullshit appears to have increased; with so much data out there, there is simply more potential for data scientists and designers to shape it to fit their own conclusions–or even intentionally mislead their audience.


A Framework for Approaching Textual Data Science Tasks

@machinelearnbot

There's an awful lot of text data available today, and enormous amounts of it are being created on a daily basis, ranging from structured to semi-structured to fully unstructured. What can we do with it? Well, quite a bit, actually; it depends on what your objectives are, but there are 2 intricately related yet differentiated umbrellas of tasks which can be exploited in order to leverage the availability of all of this data. NLP is a major aspect of computational linguistics, and also falls within the realms of computer science and artificial intelligence. Text mining exists in a similar realm as NLP, in that it is concerned with identifying interesting, non-trivial patterns in textual data.


Monetizing the Internet of Things (IoT) @ThingsExpo #AI #IoT #M2M #BigData

@machinelearnbot

"Why incur the expense of generating and collecting all of this IoT data if you're not going to monetize it?" Organizations are racing to embrace the Internet of Things (IoT) as the pundits create "visions of sugar-plums dancing in their heads." McKinsey Global Institute released their study "The Internet of Things: Mapping the Value beyond the Hype" in June 2015 that highlighted the staggering financial value that IoT could create! The folks at Wikibon provided a perspective on the sources of "IoT monetization" in their recent research titled "Harvesting Value at the Edge" written by the always delightful and provocative Neil Raden. IoT, though a useful application of available technology, and well-defined at the hardware and network levels, the heart of IoT, that part that yields the real value, is edge analytics.


A Framework for Approaching Textual Data Science Tasks

@machinelearnbot

There's an awful lot of text data available today, and enormous amounts of it are being created on a daily basis, ranging from structured to semi-structured to fully unstructured. What can we do with it? Well, quite a bit, actually; it depends on what your objectives are, but there are 2 intricately related yet differentiated umbrellas of tasks which can be exploited in order to leverage the availability of all of this data. NLP is a major aspect of computational linguistics, and also falls within the realms of computer science and artificial intelligence. Text mining exists in a similar realm as NLP, in that it is concerned with identifying interesting, non-trivial patterns in textual data.


An Overview of ResNet and its Variants – Towards Data Science

#artificialintelligence

After the celebrated victory of AlexNet [1] at the LSVRC2012 classification contest, deep Residual Network [2] was arguably the most groundbreaking work in the computer vision/deep learning community in the last few years. ResNet makes it possible to train up to hundreds or even thousands of layers and still achieves compelling performance. Taking advantage of its powerful representational ability, the performance of many computer vision applications other than image classification have been boosted, such as object detection and face recognition. Since ResNet blew people's mind in 2015, many in the research community have dived into the secrets of its success, many refinements have been made in the architecture. This article is divided into two parts, in the first part I am going to give a little bit of background knowledge for those who are unfamiliar with ResNet, in the second I will review some of the papers I read recently regarding different variants and interpretations of the ResNet architecture.


Crowdwork for Machine Learning: An Autoethnography

#artificialintelligence

Amazon's Mechanical Turk is a platform for soliciting work on online tasks that has been used by market researchers, translators, and data scientists to complete surveys, perform work that cannot be easily automated, and create human-labeled data for supervised learning systems. Because of the role crowdwork plays as a source of the human knowledge that machine intelligence relies on to train algorithms, a better understanding how crowdworking platforms like mTurk function as a conduit for human intelligence can improve its usefulness for the data scientists that rely on it. Rather than exploring that side of the crowdworking experience, I tried to focus my attention on tasks that looked like they were intended to support machine learning (rather than the various other services that mTurk supports, like psychological profiles, market research surveys, or translation tasks), and found that the design of mTurk HITs have important consequences for data scientists concerned with producing useful labeled data. No matter how extensible the task-building platform is, there are only a few ways for task designers to elicit information from task workers: Writing text in fields, selecting radio buttons or checkboxes, or using dropdown menus are the most database-friendly methods, but recording audio, capturing video or still photos from a webcam, or asking for drawn annotations may also be used.