The MIT Statistics and Data Science Center (SDSC), a part of the Institute for Data, Systems, and Society (IDSS), announced two new academic programs today: the MicroMasters program in Statistics and Data Science, and the Interdisciplinary Doctoral Program in Statistics, both beginning in the fall. The MicroMasters program, currently under development by MIT faculty, will be offered online through edX. "Digital technologies are enabling us to bring MIT's data science curriculum to learners around the world regardless of their location or socioeconomic status," says Vice President for Open Learning Sanjay Sarma. The curriculum includes foundational knowledge of data science methods and tools, a deep dive into probability and statistics, and opportunities to learn, implement, and experiment with data analysis techniques and machine learning algorithms. "The demand for data scientists is growing rapidly," says Dean for Digital Learning Krishna Rajagopal.
Two hundred students, industry professionals, and academic leaders convened at the Microsoft NERD Center in Cambridge, Massachusetts for the second annual Women in Data Science (WiDS) conference on March 5. The conference grew from 150 participants last year, and highlighted local strength in academics and health care. "The WiDS conference highlighted female leadership in data science in the Boston area," said Caroline Uhler, a member of the WiDS steering committee who is an IDSS core faculty member and assistant professor of electrical engineering and computer science (EECS) at MIT. "This event is particularly important to encourage more female scientists in related areas to join this emerging area that has such broad societal impact." Regina Barzilay, Delta Electronics Professor of EECS, gave the first presentation on how data science and machine learning approaches are improving cancer research. Barzilay said her experiences as a breast cancer survivor motivates her work.
Knowing how to write high quality software -- the days of one team writing throwaway models and another team implementing them in production are slowly coming to an end. With programming languages like Python and R and their packages making it easy to work with data and models, it is reasonable to expect a data scientist or machine learning engineer to attain a high level of programming proficiency and understand the basics of system design. While "big data" is a term used way too often, it is true that the cost of data storage is on a dramatic downward trend. This means that there are more and more data sets from different domains to work with and apply models to. And yes, knowing something about at least one of the popular areas of the field that have gotten traction lately -- deep learning for computer vision and perception, recommendation engines, NLP -- would be a great thing once you have the fundamental understanding and technical proficiency.
There has never been a better time to be a politician. But it's an even better time to be a machine learning engineer working for a politician. Throughout modern history, political candidates have had only a limited number of tools to take the temperature of the electorate. More often than not, they've had to rely on instinct rather than insight when running for office. Now big data can be used to maximise the effectiveness of a campaign.
Numerai is a hedge fund that's using technology to create an unprecedented network effect, and transform the way money is managed. Crowdsourced investment strategies are many and varied, but Numerai crowdsources machine intelligence in a totally unique way by supplying its network of data scientists with encrypted data on which to test their machine learning models, thus removing any bias attached to the application of the algorithms. These models are entered into a monthly tournament and the best ones receive a pay-out. This was previously done using Bitcoin (because it was efficient and more anonymous than PayPal), but more recently Numerai launched its own token, Numeraire (NMR), on Ethereum, the public blockchain which has spawned a multitude of trustless, decentralized applications. The aim of the token was to create more value for Numerai's growing network of scientists, and further align them with the collaborative goals of the project.
Machine Learning is the buzzword of the moment. In recent years, news stories raving about its possibilities have soared, Google searches for the term have quadrupled, and companies across the globe have been scrambling to figure out how to capitalize on the excitement by bringing it into their product mix. While that can be a great thing, claims made by some businesses about what Machine Learning can do are wildly exaggerated. That makes it crucial to cut through the noise and get to grips with its potential, limitations, and what you can realistically achieve with your resources so that any investment makes solid business sense -- so say Philip Lima, CEO of Mashey, and Boaz Farkash, Head of Product Management at Sisense. The pair joined forces to deliver an in-depth webinar on Machine Learning and business intelligence, which you can view in full here.
Historically, most of the data businesses have analyzed for decision-making has been of the structured variety--easily entered, stored, and queried. In the digital age, that universe of potentially valuable data keeps expanding exponentially. Most of it is unstructured data, coming from a wide variety of sources, from websites to wearable devices. As a recent McKinsey Global Institute report noted: "Much of this newly available data is in the form of clicks, images, text, or signals of various sorts, which is very different than the structured data that can be cleanly placed in rows and columns." At the same time, we have entered an era when machine learning can theoretically find patterns in vast amounts of data to enable enterprises to uncover insights that may not have been visible before.
When I tell people that I work at an AI company, they often follow up with, "So, what kind of machine learning/deep learning do you do?" This isn't surprising, as most of the market attention (and hype) in and around AI has been centered around machine learning and its high-profile subset deep learning and around natural language processing with the rise of the chatbot and virtual assistants. But while machine learning is a core component of artificial intelligence, AI is, in fact, more than just ML. So, what does it really mean for an application to be "intelligent"? What does it take to create a system that is artificially intelligent?
I am spending some cycles on my algorithmic rotoscope work -- which is basically a stationary exercise bicycle for my learning about what is and what is not Machine Learning. I am using it to help me understand and tell stories about Machine Learning by creating images using Machine Learning that I can use in my Machine Learning storytelling. Picture a bunch of Machine Learning gears all working together to help make sense of what I'm doing, and WTF I am talking about? As I'm writing a story on how image style transfer Machine Learning could be put to use by libraries, museums, and collection curators, I'm reminded of what a con machine learning will be in the future, and how it will be a vehicle for the extraction of value and outright theft. My image style transfer work is just one tiny slice of this pie.