MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data. Democratizing data science is the notion that anyone, with little to no expertise, can do data science if provided ample data and user-friendly analytics tools. Supporting that idea, the new tool ingests datasets and generates sophisticated statistical models typically used by experts to analyze, interpret, and predict underlying patterns in data. The tool currently lives on Jupyter Notebook, an open-source web framework that allows users to run programs interactively in their browsers. Users need only write a few lines of code to uncover insights into, for instance, financial trends, air travel, voting patterns, the spread of disease, and other trends.
A great way to understand the future priorities for a company is to see where they invest resources. When you look at where Toyota, the Japanese industry giant, has recently invested, it's clear the company is preparing to remain relevant and competitive in the 4th industrial revolution as a result of its investments and innovation in artificial intelligence, big data and robots. With initial funding of $100 million, Toyota AI Ventures invests in tech start-ups and entrepreneurs around the world that are committed to autonomous mobility, data and robotics. Toyota's investments help accelerate getting critical new technologies to market. One of the organization's investments is in May Mobility, a company that is developing self-driving shuttles for college campuses and other areas such as central business districts where low-speed applications are warranted.
Using a highly sophisticated form of pattern matching, researchers from Florida Atlantic University's College of Engineering and Computer Science are teaching "machines" to detect Medicare fraud. About $19 billion to $65 billion is lost every year because of Medicare fraud, waste, or abuse. Like the proverbial "needle in a haystack," human auditors or investigators have the painstaking task of manually checking thousands of Medicare claims for specific patterns that could indicate foul play or fraudulent behaviors. Furthermore, according to the U.S. Department of Justice, right now fraud enforcement efforts rely heavily on health care professionals coming forward with information about Medicare fraud. "The Effects of Varying Class Distribution on Learner Behavior for Medicare Fraud Detection With Imbalanced Big Data," published in the journal Health Information Science and Systems, uses big data from Medicare Part B and employs advanced data analytics and machine learning to automate the fraud detection process.
What will be the next thing to revolutionize data science in 2019? Reinforcement learning will be the next big thing in data science in 2019. While RL has been around for a long time in academia, it has hardly seen any industry adoption at all. Why? Partly because there have been plenty of low-hanging fruits to pick in predictive analytics, but mostly because of the barriers in implementation, knowledge and available tools. The potential value in using RL in proactive analytics and AI is enormous, but it also demands a greater skillset to master.
Companies in all industries must stay up to date with the latest tech to survive in this digital world. This is especially true in the case of machine learning (ML), which has the potential to transform the way businesses process and use their data. While ML has a number of useful applications in the business world, applying it to business intelligence (BI) insights can help you optimize your processes and make even better decisions. Thirteen members of Forbes Technology Council shared some creative ways to combine business intelligence with machine learning to produce the best results for your company. One of the most unique ways to combine business intelligence and machine learning is the identification of fraud indicators.
One of the formidable challenges healthcare providers face is putting medical data to maximum use. Somewhere between the quest to unlock the mysteries of medicine and design better treatments, therapies, and procedures, lies the real world of applying data and protecting patient privacy. "Today, there are many barriers to putting data to work in the most effective way possible," observes Drew Harris, director of health policy and population health at Thomas Jefferson University's College of Population Health in Philadelphia, PA. "The goals of protecting patients and finding answers are frequently at odds." It is a critical issue and one that will define the future of medicine. Medical advances are increasingly dependent on the analysis of enormous datasets--as well as data that extends beyond any one agency or enterprise.
The MIT Statistics and Data Science Center (SDSC), a part of the Institute for Data, Systems, and Society (IDSS), announced two new academic programs today: the MicroMasters program in Statistics and Data Science, and the Interdisciplinary Doctoral Program in Statistics, both beginning in the fall. The MicroMasters program, currently under development by MIT faculty, will be offered online through edX. "Digital technologies are enabling us to bring MIT's data science curriculum to learners around the world regardless of their location or socioeconomic status," says Vice President for Open Learning Sanjay Sarma. The curriculum includes foundational knowledge of data science methods and tools, a deep dive into probability and statistics, and opportunities to learn, implement, and experiment with data analysis techniques and machine learning algorithms. "The demand for data scientists is growing rapidly," says Dean for Digital Learning Krishna Rajagopal.
When businesses identify a problem that can be solved through machine learning, they brief the data scientists and analysts to create a predictive analytics solution. In many cases, the turnaround time for delivering a solution is pretty long. Even for experienced data scientists, evolving machine learning models that can accurately predict the results is always challenging and time-consuming. The complex workflow involved in machine learning models have multiple stages. Some of the significant steps include data acquisition, data exploration, feature engineering, model selection, experimentation and prediction.
Two hundred students, industry professionals, and academic leaders convened at the Microsoft NERD Center in Cambridge, Massachusetts for the second annual Women in Data Science (WiDS) conference on March 5. The conference grew from 150 participants last year, and highlighted local strength in academics and health care. "The WiDS conference highlighted female leadership in data science in the Boston area," said Caroline Uhler, a member of the WiDS steering committee who is an IDSS core faculty member and assistant professor of electrical engineering and computer science (EECS) at MIT. "This event is particularly important to encourage more female scientists in related areas to join this emerging area that has such broad societal impact." Regina Barzilay, Delta Electronics Professor of EECS, gave the first presentation on how data science and machine learning approaches are improving cancer research. Barzilay said her experiences as a breast cancer survivor motivates her work.
Knowing how to write high quality software -- the days of one team writing throwaway models and another team implementing them in production are slowly coming to an end. With programming languages like Python and R and their packages making it easy to work with data and models, it is reasonable to expect a data scientist or machine learning engineer to attain a high level of programming proficiency and understand the basics of system design. While "big data" is a term used way too often, it is true that the cost of data storage is on a dramatic downward trend. This means that there are more and more data sets from different domains to work with and apply models to. And yes, knowing something about at least one of the popular areas of the field that have gotten traction lately -- deep learning for computer vision and perception, recommendation engines, NLP -- would be a great thing once you have the fundamental understanding and technical proficiency.