Goto

Collaborating Authors

Communications of the ACM


Sampling Near Neighbors in Search for Fairness

Communications of the ACM

Locality Sensitive Hashing (LSH) is a common tool for solving the ANN problem and was introduced in Har-Peled et al.16 DEFINITION 4. A distribution H over maps h: X U, for a suitable set U, is (r, c·r, p1, p2)-sensitive if the following holds for any x, y X: The distribution H is an LSH family, and has quality . For the sake of simplicity, we assume that p2 1/n: if p2 1/n, then it suffices to create a new LSH family HK obtained by concatenating K Θ (logp2 (1/n)) independent and identically distributed hashing functions from H. The new family -sensitive and ρ does not change. The standard approach to (c, r)-ANN using LSH functions is the following. Let D denote the data structure constructed by LSH, and let c denote the approximation parameter of LSH. Each D consists of L nρ hash functions ℓ1, …, ℓL randomly and uniformly selected from H. D contains L hash tables H1, … HL: each hash table Hi contains the input set S and uses the hash function ℓi to split the point set into buckets.


Interpretable Machine Learning

Communications of the ACM

Later in this article we include an extensive discussion about best practices for this IML workflow to flesh out the taxonomy and deliver rigorously tested diagnostics to consumers. Ultimately, there could be an increasingly complete taxonomy that allows consumers (C) to find suitable IML methods for their use cases and helps researchers (R) to ground their technical work in real applications (as seen on the right side of Figure 2). For instance, the accompanying table highlights concrete examples of how three different potential diagnostics, each corresponding to different types of IML methods (local feature attribution, local counterfactual, and global counterfactual, respectively), may provide useful insights for three use cases. In particular, the computer vision use case from the table is expanded upon as a running example. An increasingly diverse set of methods has been recently proposed and broadly classified as part of IML. Multiple concerns have been expressed, however, in light of this rapid development, focused on IML's underlying foundations and the gap between research and practice.


Transforming Science through Cyberinfrastructure

Communications of the ACM

Advanced cyberinfrastructure (CI) is critical to science and engineering (S&E) research. For example, over the past two years, CI resources (including those provided by the COVID-19 HPC Consortiuma) enabled research that dramatically accelerated efforts to understand, respond to, and mitigate near- and longer-term impacts of the novel coronavirus disease 2019 (COVID-19) pandemic.b Computer-based epidemiology models informed public policy in the U.S., and in countries throughout the world, and newly studied transmission models for the virus have been used to forecast resource availability and mortality stratified by age group at the county level.c Artificial intelligence and machine learning approaches accelerated drug screening to find candidate medicines from trillions of possible chemical compounds,d and differential gene expressions among COVID-19 patient populations have been analyzed with important implications for treatment planning.e Structural modeling of the virus has led to new insights, speeding the development of vaccines and antigens.


Computational Thinking in the Era of Data Science

Communications of the ACM

Recent years have seen the integration of computer science, mathematicsa and statistics, together with real-world domain knowledge, into a new research and applications field: data science.4 Just as data science integrates knowledge and skills from computer science, statistics, and a real-world application domain, data thinking, we propose, integrates computational thinking, statistical thinking, and domain thinking. Computational thinking was first introduced by Papert13 and, a quarter of a century later, was illuminated and elaborated on by Wing.15 As it turns out, exploring the novelty of data thinking uncovers new facets of computational thinking. In this Viewpoint, we first present our interpretation of the concept of data thinking and then, based on insights gained from the discussion about data thinking, we propose a timely need has emerged to introduce data thinking into computer science education along with computational thinking, in the context of various real-world domains using real-life data.


An Interview with Dana Scott

Communications of the ACM

ACM fellow Dana Stewart Scott, the recipient jointly with Michael Rabin of the 1976 A.M. Turing Award for the concept of nondeterministic finite automata, has made seminal contributions spanning computing science, mathematics, philosophy, automata theory, modal logic, model theory, set theory, and the theory of programming languages. After receiving a B.A. in mathematics from the University of California, Berkeley, in 1950, and a Ph.D. from Princeton University in 1958, he held faculty positions at the University of Chicago, UC Berkeley, and at Stanford, Princeton, Oxford, and Carnegie Mellon Universities. He retired as University Professor from CMU in 2003. The distinguished theoretical computer scientist Gordon Plotkin conducted a series of four oral history interviews of Scott between November 2020 and February 2021. The interviews, the transcripts and videos of which are online,a cover primarily the period leading up to the 1976 ACM A.M. Turing Award. Presented here is a condensed and highly edited version, which includes some additional post-interview material provided by Scott. I was born in 1932 in Berkeley, CA, where I am now in retirement. We lived on a farm near Susanville when I started first grade in a one-room school-house.



Formalizing Fairness

Communications of the ACM

As machine learning has made its way into more and more areas of our lives, concerns about algorithmic bias have escalated. Machine learning models, which today facilitate decisions about everything from hiring and lending to medical diagnosis and criminal sentencing, may appear to be data-driven and impartial, at least to naïve users--but the typically opaque models are only as good the data they are trained on, and only as ethical as the value judgments embedded in the algorithms. The burgeoning field of algorithmic fairness, part of the much broader field of responsible computing, is aiming to remedy the situation. For several years now, along with philosophers, legal scholars, and experts in other fields, computer scientists have been tackling the issue. As Stanford University computer science professor Omer Reingold likes to put it, "We are part of the problem, and we should be part of the solution."


Crossing the Uncanny Valley

Communications of the ACM

In 1970, robotics expert Masahiro Mori first described the effect of the "uncanny valley," a concept that has had a massive impact on the field of robotics. The uncanny valley, or UV, effect, describes the positive and negative responses that human beings exhibit when they see human-like objects, specifically robots. The UV effect theorizes that our empathy towards a robot increases the more it looks and moves like a human. However, at some point, the robot or avatar becomes too lifelike, while still being unfamiliar. This confuses the brain's visual processing systems.


The Seattle Report on Database Research

Communications of the ACM

From the inception of the field, academic database research has strongly influenced the database industry and vice versa. The database community, both research and industry, has grown substantially over the years. The relational database market alone has revenue upwards of $50B. On the academic front, database researchers continue to be recognized with significant awards. Over the last decade, our research community pioneered the use of columnar storage, which is used in all commercial data analytic platforms. Database systems offered as cloud services have witnessed explosive growth. Hybrid transactional/analytical processing (HTAP) systems are now an important segment of the industry. Furthermore, memory-optimized data structures, modern compilation, and code-generation have significantly enhanced performance of traditional database engines. All data platforms have embraced SQL-style APIs as the predominant way to query and retrieve data. Database researchers have played an important part in influencing the evolution of streaming data platforms as well as distributed key-value stores. A new generation of data cleaning and data wrangling technology is being actively explored.


Technical Perspective: Can Data Structures Treat Us Fairly?

Communications of the ACM

When asked to pick a random buddy from our Facebook friends list, we may struggle since our nomination is likely to be biased toward individuals with whom we interact more frequently. Computer assistance in such a case would make things easier: we can just store our friends' names in an array. Whenever a query comes, the computer generates a random array index and returns the name stored in the corresponding location. The question now is whether, for any given data structure problem, we can build a data structure that generates "fair" output while maintaining query efficiency. In the context of data structure query-answering, fairness can be defined as follows: we either return all valid answers or just return a uniformly random one.