One of the formidable challenges healthcare providers face is putting medical data to maximum use. Somewhere between the quest to unlock the mysteries of medicine and design better treatments, therapies, and procedures, lies the real world of applying data and protecting patient privacy. "Today, there are many barriers to putting data to work in the most effective way possible," observes Drew Harris, director of health policy and population health at Thomas Jefferson University's College of Population Health in Philadelphia, PA. "The goals of protecting patients and finding answers are frequently at odds." It is a critical issue and one that will define the future of medicine. Medical advances are increasingly dependent on the analysis of enormous datasets--as well as data that extends beyond any one agency or enterprise.
The MIT Statistics and Data Science Center (SDSC), a part of the Institute for Data, Systems, and Society (IDSS), announced two new academic programs today: the MicroMasters program in Statistics and Data Science, and the Interdisciplinary Doctoral Program in Statistics, both beginning in the fall. The MicroMasters program, currently under development by MIT faculty, will be offered online through edX. "Digital technologies are enabling us to bring MIT's data science curriculum to learners around the world regardless of their location or socioeconomic status," says Vice President for Open Learning Sanjay Sarma. The curriculum includes foundational knowledge of data science methods and tools, a deep dive into probability and statistics, and opportunities to learn, implement, and experiment with data analysis techniques and machine learning algorithms. "The demand for data scientists is growing rapidly," says Dean for Digital Learning Krishna Rajagopal.
When businesses identify a problem that can be solved through machine learning, they brief the data scientists and analysts to create a predictive analytics solution. In many cases, the turnaround time for delivering a solution is pretty long. Even for experienced data scientists, evolving machine learning models that can accurately predict the results is always challenging and time-consuming. The complex workflow involved in machine learning models have multiple stages. Some of the significant steps include data acquisition, data exploration, feature engineering, model selection, experimentation and prediction.
Two hundred students, industry professionals, and academic leaders convened at the Microsoft NERD Center in Cambridge, Massachusetts for the second annual Women in Data Science (WiDS) conference on March 5. The conference grew from 150 participants last year, and highlighted local strength in academics and health care. "The WiDS conference highlighted female leadership in data science in the Boston area," said Caroline Uhler, a member of the WiDS steering committee who is an IDSS core faculty member and assistant professor of electrical engineering and computer science (EECS) at MIT. "This event is particularly important to encourage more female scientists in related areas to join this emerging area that has such broad societal impact." Regina Barzilay, Delta Electronics Professor of EECS, gave the first presentation on how data science and machine learning approaches are improving cancer research. Barzilay said her experiences as a breast cancer survivor motivates her work.
Knowing how to write high quality software -- the days of one team writing throwaway models and another team implementing them in production are slowly coming to an end. With programming languages like Python and R and their packages making it easy to work with data and models, it is reasonable to expect a data scientist or machine learning engineer to attain a high level of programming proficiency and understand the basics of system design. While "big data" is a term used way too often, it is true that the cost of data storage is on a dramatic downward trend. This means that there are more and more data sets from different domains to work with and apply models to. And yes, knowing something about at least one of the popular areas of the field that have gotten traction lately -- deep learning for computer vision and perception, recommendation engines, NLP -- would be a great thing once you have the fundamental understanding and technical proficiency.
There has never been a better time to be a politician. But it's an even better time to be a machine learning engineer working for a politician. Throughout modern history, political candidates have had only a limited number of tools to take the temperature of the electorate. More often than not, they've had to rely on instinct rather than insight when running for office. Now big data can be used to maximise the effectiveness of a campaign.
Numerai is a hedge fund that's using technology to create an unprecedented network effect, and transform the way money is managed. Crowdsourced investment strategies are many and varied, but Numerai crowdsources machine intelligence in a totally unique way by supplying its network of data scientists with encrypted data on which to test their machine learning models, thus removing any bias attached to the application of the algorithms. These models are entered into a monthly tournament and the best ones receive a pay-out. This was previously done using Bitcoin (because it was efficient and more anonymous than PayPal), but more recently Numerai launched its own token, Numeraire (NMR), on Ethereum, the public blockchain which has spawned a multitude of trustless, decentralized applications. The aim of the token was to create more value for Numerai's growing network of scientists, and further align them with the collaborative goals of the project.
Machine Learning is the buzzword of the moment. In recent years, news stories raving about its possibilities have soared, Google searches for the term have quadrupled, and companies across the globe have been scrambling to figure out how to capitalize on the excitement by bringing it into their product mix. While that can be a great thing, claims made by some businesses about what Machine Learning can do are wildly exaggerated. That makes it crucial to cut through the noise and get to grips with its potential, limitations, and what you can realistically achieve with your resources so that any investment makes solid business sense -- so say Philip Lima, CEO of Mashey, and Boaz Farkash, Head of Product Management at Sisense. The pair joined forces to deliver an in-depth webinar on Machine Learning and business intelligence, which you can view in full here.
Historically, most of the data businesses have analyzed for decision-making has been of the structured variety--easily entered, stored, and queried. In the digital age, that universe of potentially valuable data keeps expanding exponentially. Most of it is unstructured data, coming from a wide variety of sources, from websites to wearable devices. As a recent McKinsey Global Institute report noted: "Much of this newly available data is in the form of clicks, images, text, or signals of various sorts, which is very different than the structured data that can be cleanly placed in rows and columns." At the same time, we have entered an era when machine learning can theoretically find patterns in vast amounts of data to enable enterprises to uncover insights that may not have been visible before.