Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience. Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades.
To be a real "full-stack" data scientist, or what many bloggers and employers call a "unicorn," you have to master every step of the data science process -- all the way from storing your data, to putting your finished product (typically a predictive model) in production. But the bulk of data science training focuses on machine/deep learning techniques; data management knowledge is often treated as an afterthought. Data science students usually learn modeling skills with processed and cleaned data in text files stored on their laptop, ignoring how the data sausage is made. Students often don't realize that in industry settings, getting the raw data from various sources to be ready for modeling is usually 80% of the work. And because enterprise projects usually involve a massive amount of data that their local machine is not equipped to handle, the entire modeling process often takes place in the cloud, with most of the applications and databases hosted on servers in data centers elsewhere. Even after the student landed a job as a data scientist, data management often becomes something that a separate data engineering team takes care of. As a result, too many data scientists know too little about data storage and infrastructure, often to the detriment of their ability to make the right decisions at their jobs. The goal of this article is to provide a roadmap of what a data scientist in 2019 should know about data management -- from types of databases, where and how data is stored and processed, to the current commercial options -- so the aspiring "unicorns" could dive deeper on their own, or at least learn enough to sound like one at interviews and cocktail parties.
The authors of the Harrisburg University study make explicit their desire to provide "a significant advantage for law enforcement agencies and other intelligence agencies to prevent crime" as a co-author and former NYPD police officer outlined in the original press release. At a time when the legitimacy of the carceral state, and policing in particular, is being challenged on fundamental grounds in the United States, there is high demand in law enforcement for research of this nature, research which erases historical violence and manufactures fear through the so-called prediction of criminality. Publishers and funding agencies serve a crucial role in feeding this ravenous maw by providing platforms and incentives for such research. The circulation of this work by a major publisher like Springer would represent a significant step towards the legitimation and application of repeatedly debunked, socially harmful research in the real world. To reiterate our demands, the review committee must publicly rescind the offer for publication of this specific study, along with an explanation of the criteria used to evaluate it. Springer must issue a statement condemning the use of criminal justice statistics to predict criminality and acknowledging their role in incentivizing such harmful scholarship in the past. Finally, all publishers must refrain from publishing similar studies in the future.
By now, it's almost old news: big data will transform medicine. It's essential to remember, however, that data by themselves are useless. To be useful, data must be analyzed, interpreted, and acted on. Thus, it is algorithms -- not data sets -- that will prove transformative. We believe, therefore, that attention has to shift to new statistical tools from the field of machine learning that will be critical for anyone practicing medicine in the 21st century.