Collaborating Authors

The Data Science Life Cycle

Communications of the ACM

Victoria Stodden ( is a statistician and associate professor at the University of Illinois at Urbana-Champaign, IL, USA. This material is based upon work supported by National Science Foundation Award #1941443.

Wanted: Toolsmiths

Communications of the ACM

"As we honor the more mathematical, abstract, and scientific' parts of our subject more, and the practical parts less, we misdirect the young and brilliant minds away from a body of challenging and important problems that are our peculiar domain, depriving these problems of the powerful attacks they deserve." I have the privilege of working at the Defense Advanced Research Projects Agency (DARPA) and currently serve as the Acting Deputy Director of the Defense Sciences Office (DSO). Our goal at DARPA is to create and prevent technological surprise through investments in science and engineering, and our history and contributions are well documented. The DSO is sometimes called "DARPA's DARPA," because we strive to be at the forefront of all of science--on the constant lookout for opportunities to enhance our national security and collective well-being, and our projects are very diverse. One project uses cold atoms to measure time with 10 18th precision; another is creating amazing composite materials that can change the way in which we manufacture.

Making Friends: Building Social Robots Through Interdisciplinary Collaboration

AAAI Conferences

This paper discusses social robotics as a hybrid knowledge space that encourages interaction and collaboration among many different disciplines: engineering, computer science, the social sciences and humanities, design, the arts, etc. Such collaboration in the design of socio-culturally situated artifacts poses many challenges, occassioned by differences in conceptual frameworks, methods for conducting research, and even daily work practices. By approaching these challenges in a spirit of friendship across the sciences, it is possible to achieve transdisciplinary understanding and reap the benefits of applying different, yet complementary, forms of expertise to social robot design. In this paper, we use insights and lessons learned from our own collaborative experiences to discuss how social as well as technical and design issues are addressed in the construction and evaluation of social robots and how the boundaries between the social, natural, and applied sciences are challenged, redefined, and traversed.

Realizing the Potential of Data Science

Communications of the ACM

The ability to manipulate and understand data is increasingly critical to discovery and innovation. As a result, we see the emergence of a new field--data science--that focuses on the processes and systems that enable us to extract knowledge or insight from data in various forms and translate it into action. In practice, data science has evolved as an interdisciplinary field that integrates approaches from such data-analysis fields as statistics, data mining, and predictive analytics and incorporates advances in scalable computing and data management. But as a discipline, data science is only in its infancy. The challenge of developing data science in a way that achieves its full potential raises important questions for the research and education community: How can we evolve the field of data science so it supports the increasing role of data in all spheres? How do we train a workforce of professionals who can use data to its best advantage? What should we teach them? What can government agencies do to help maximize the potential of data science to drive discovery and address current and future needs for a workforce with data science expertise?

Data Science

Communications of the ACM

While data science has emerged as an ambitious new scientific field, related debates and discussions have sought to address why science in general needs data science and what even makes data science a science. Following a comprehensive literature review,5,6,10,11,12,15,18 I offer a number of observations concerning big data and the data science debate. For example, discussion has covered not only data-related disciplines and domains like statistics, computing, and informatics but traditionally less data-related fields and areas like social science and business management as well. Data science has thus emerged as a new inter- and cross-disciplinary field. Although many publications are available, most (likely over 95%) concern existing concepts and topics in statistics, data mining, machine learning, and broad data analytics. This limited view demonstrates how data science has emerged from existing core disciplines, particularly statistics, computing, and informatics. The abuse, misuse, and overuse of the term "data science" is ubiquitous, contributing to the hype, and myths and pitfalls are common.4 While specific challenges have been covered,13,16 few scholars have addressed the low-level complexities and problematic nature of data science or contributed deep insight about the intrinsic challenges, directions, and opportunities of data science as an emerging field. Data science promises new opportunities for scientific research, addressing, say, "What can I do now but could not do before, as when processing large-scale data?"; "What did I do before that does not work now, as in methods that view data objects as independent and identically distributed variables (IID)?"; "What problems not solved well previously are becoming even more complex, as when quantifying complex behavioral data?"; and "What could I not do better before, as in deep analytics and learning?"