Collaborating Authors


Stop aggregating away the signal in your data


For five years as a data analyst, I forecasted and analyzed Google's revenue. For six years as a data visualization specialist, I've helped clients and colleagues discover new features of the data they know best. Time and time again, I've found that by being more specific about what's important to us and embracing the complexity in our data, we can discover new features in that data. These features can lead us to ask better data-driven questions that change how we analyze our data, the parameters we choose for our models, our scientific processes, or our business strategies. My colleagues Ian Johnson, Mike Freeman, and I recently collaborated on a series of data-driven stories about electricity usage in Texas and California to illustrate best practices of Analyzing Time Series Data.


Communications of the ACM

We present SoundWatch, a smartwatch-based deep learning application to sense, classify, and provide feedback about sounds occurring in the environment.

Responsible Data Management

Communications of the ACM

Incorporating ethics and legal compliance into data-driven algorithmic systems has been attracting significant attention from the computing research community, most notably under the umbrella of fair8 and interpretable16 machine learning. While important, much of this work has been limited in scope to the "last mile" of data analysis and has disregarded both the system's design, development, and use life cycle (What are we automating and why? Is the system working as intended? Are there any unforeseen consequences post-deployment?) and the data life cycle (Where did the data come from? How long is it valid and appropriate?). In this article, we argue two points. First, the decisions we make during data collection and preparation profoundly impact the robustness, fairness, and interpretability of the systems we build. Second, our responsibility for the operation of these systems does not stop when they are deployed. To make our discussion concrete, consider the use of predictive analytics in hiring. Automated hiring systems are seeing ever broader use and are as varied as the hiring practices themselves, ranging from resume screeners that claim to identify promising applicantsa to video and voice analysis tools that facilitate the interview processb and game-based assessments that promise to surface personality traits indicative of future success.c Bogen and Rieke5 describe the hiring process from the employer's point of view as a series of decisions that forms a funnel, with stages corresponding to sourcing, screening, interviewing, and selection. The hiring funnel is an example of an automated decision system--a data-driven, algorithm-assisted process that culminates in job offers to some candidates and rejections to others. The popularity of automated hiring systems is due in no small part to our collective quest for efficiency.

Data Engineering Lead (Remote)


Solar power is the largest source of new energy in the world. Raptor Maps is a fast-growing, venture-backed, MIT-born climate tech startup that is building software to enable the solar energy industry to scale. Parties across the entire solar lifecycle use Raptor Maps' data model to manage ever growing utility-scale solar portfolios. We are an industry leader with hundreds of customers, including owners, builders, operators, and aerial service providers, across over 40 countries with 200 million solar panels under management. Our software platform is essential in the fight against climate change.

US Companies Must Deal with EU AI law, Like It or Not


Don't look now, but using Google Analytics to track your website's audience might be illegal. That's the view of a court in Austria, which in January found that Google's data product was in breach of the European Union's General Data Protection Regulation (GDPR) as it was not doing enough to make sure data transferred from the EU to the company's servers in the US was protected (from, say, US intelligence agencies). Well for those working in AI and biotech, it matters, especially to those working outside of Europe with a view to expansion there. For a start, this is a major precedent that threatens to upend the way many tech companies work, since the tech sector relies heavily on the safe use and transfer of large quantities of data. Whether you use Google Analytics is neither here nor there; the case has shown that Privacy Shield -- the EU-US framework that governs the transfer of personal information in compliance with GDPR -- may not be compliant with European law after all.

Market Segmentation in the Emoji Era

Communications of the ACM

Ishaan and Elizabeth, both graduate students in business, are attending a marketing strategy lecture at a business school in the Northeast. While learning about the principles of market segmentation, Ishaan texts "outdated" followed by three thinking--face emojis to Elizabeth. He wonders how demographic-, geographic-, or psychographic-based segmentation--the topic of the lecture--can help his family's franchise restaurant deal with the hundreds of sometimes-not-so-positive online reviews and social media posts. Meanwhile, Elizabeth hopes that the fast-food restaurant where she ordered her lunch understands that she now belongs to the segment of'extremely displeased' customers. Earlier, she used the restaurant's new app to order a burrito without cheese and sour cream, only to discover that the meal included both offending ingredients. Her lunch went straight into the trash can and she angrily tweeted her disappointment to the restaurant. This simple vignette illustrates an important point. Organizations of every size are challenged with capitalizing on enormous amounts of unstructured organizational data--for instance, from social media posts--particularly for applications such as market segmentation. The purpose of this article is to give the reader an idea of the challenges and opportunities faced by businesses using market segmentation, including the impacts of big data. Our research will demonstrate what market segmentation might look like in the near future, as we also offer a promising approach to implementing market segmentation using unstructured data.

65 Competencies

Communications of the ACM

Analyzing data is now essential to success in education, employment, and other areas of activity in the knowledge society. Even though several frameworks describe the competencies and skills needed to meet current and future challenges, no data analytics competency framework exists to describe the importance of specific skills to succeed in data analytics assignments.

GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs Artificial Intelligence

We argue that when comparing two graphs, the distribution of node structural features is more informative than global graph statistics which are often used in practice, especially to evaluate graph generative models. Thus, we present GraphDCA - a framework for evaluating similarity between graphs based on the alignment of their respective node representation sets. The sets are compared using a recently proposed method for comparing representation spaces, called Delaunay Component Analysis (DCA), which we extend to graph data. To evaluate our framework, we generate a benchmark dataset of graphs exhibiting different structural patterns and show, using three node structure feature extractors, that GraphDCA recognizes graphs with both similar and dissimilar local structure. We then apply our framework to evaluate three publicly available real-world graph datasets and demonstrate, using gradual edge perturbations, that GraphDCA satisfyingly captures gradually decreasing similarity, unlike global statistics. Finally, we use GraphDCA to evaluate two state-of-the-art graph generative models, NetGAN and CELL, and conclude that further improvements are needed for these models to adequately reproduce local structural features.

Latent gaze information in highly dynamic decision-tasks Artificial Intelligence

Digitization is penetrating more and more areas of life. Tasks are increasingly being completed digitally, and are therefore not only fulfilled faster, more efficiently but also more purposefully and successfully. The rapid developments in the field of artificial intelligence in recent years have played a major role in this, as they brought up many helpful approaches to build on. At the same time, the eyes, their movements, and the meaning of these movements are being progressively researched. The combination of these developments has led to exciting approaches. In this dissertation, I present some of these approaches which I worked on during my Ph.D. First, I provide insight into the development of models that use artificial intelligence to connect eye movements with visual expertise. This is demonstrated for two domains or rather groups of people: athletes in decision-making actions and surgeons in arthroscopic procedures. The resulting models can be considered as digital diagnostic models for automatic expertise recognition. Furthermore, I show approaches that investigate the transferability of eye movement patterns to different expertise domains and subsequently, important aspects of techniques for generalization. Finally, I address the temporal detection of confusion based on eye movement data. The results suggest the use of the resulting model as a clock signal for possible digital assistance options in the training of young professionals. An interesting aspect of my research is that I was able to draw on very valuable data from DFB youth elite athletes as well as on long-standing experts in arthroscopy. In particular, the work with the DFB data attracted the interest of radio and print media, namely DeutschlandFunk Nova and SWR DasDing. All resulting articles presented here have been published in internationally renowned journals or at conferences.