Collaborating Authors

North America

Researchers built an AI that plays chess like a person, not a super computer


We mere mortals haven't truly been competitive against artificial intelligence in chess in a long time. It's been 15 years since a human has conquered a computer in a chess tournament. However, a team of researchers have developed an AI chess engine that doesn't set out to crush us puny humans -- it tries to play like us. The Maia engine doesn't necessarily play the best available move. Instead, it tries to replicate what a human would do.

Bringing Stability to Wireless Connections

Communications of the ACM

Communication is more important than ever, with everything from college to CrossFit going virtual during the COVID-19 pandemic. Nobody understands this better than 2020 Marconi Prize recipient Andrea Goldsmith, who has spent her career making the wireless connections on which we rely more capable and stable. A pioneer of both theoretical and practical advances in adaptive wireless communications, Goldsmith spoke about her work on multiple-input and multiple-output (MIMO) channel performance limits, her new role as the incoming dean at Princeton University's School of Engineering and Applied Science, and what's next for networking. As an undergrad, you studied engineering at the University of California, Berkeley. What drew you to wireless communications?

Scalable Signal Reconstruction for a Broad Range of Applications

Communications of the ACM

Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas, such as network traffic engineering, medical image reconstruction, acoustics, astronomy, and many more. Unfortunately, most of the common approaches for solving SRP do not scale to large problem sizes. We propose a novel and scalable algorithm for solving this critical problem. Specifically, we make four major contributions. First, we propose a dual formulation of the problem and develop the DIRECT algorithm that is significantly more efficient than the state of the art. Second, we show how adapting database techniques developed for scalable similarity joins provides a substantial speedup over DIRECT. Third, we describe several practical techniques that allow our algorithm to scale--on a single machine--to settings that are orders of magnitude larger than previously studied. Finally, we use the database techniques of materialization and reuse to extend our result to dynamic settings where the input to the SRP changes. Extensive experiments on real-world and synthetic data confirm the efficiency, effectiveness, and scalability of our proposal. The database community has been at the forefront of grappling with challenges of big data and has developed numerous techniques for the scalable processing and analysis of massive datasets. These techniques often originate from solving core data management challenges but then find their way into effectively addressing the needs of big data analytics. We study how database techniques can benefit large-scale signal reconstruction,13 which is of interest to research communities as diverse as computer networks,15 medical imaging,7 etc. We demonstrate that the scalability of existing solutions can be significantly improved using ideas originally developed for similarity joins5 and selectivity estimation for set similarity queries.3 Signal reconstruction problem (SRP): The essence of SRP is to solve a linear system of the form AX b, where X is a high-dimensional unknown signal (represented by an m-d vector in Rm), b is a low-dimensional projection of X that can be observed in practice (represented by an n-d vector in Rn with n m), and A is an n m matrix that captures the linear relationship between X and b.


Communications of the ACM

This paper introduces BioScript, a domain-specific language (DSL) for programmable biochemistry that executes on emerging microfluidic platforms. The goal of this research is to provide a simple, intuitive, and type-safe DSL that is accessible to life science practitioners. The novel feature of the language is its syntax, which aims to optimize human readability; the technical contribution of the paper is the BioScript type system. The type system ensures that certain types of errors, specific to biochemistry, do not occur, such as the interaction of chemicals that may be unsafe. Results are obtained using a custom-built compiler that implements the BioScript language and type system. The last two decades have witnessed the emergence of software-programmable laboratory-on-a-chip (pLoC) technology, enabled by technological advances in microfabrication and coupled with scientific understanding of microfluidics, the fundamental science of fluid behavior at the micro- to nanoliter scale. The net result of these collective advancements is that many experimental laboratory procedures have been miniaturized, accelerated, and automated, similar in principle to how the world's earliest computers automated tedious mathematical calculations that were previously performed by hand. Although the vast majority of microfluidic devices are effectively application-specific integrated circuits (ASICs), a variety of programmable LoCs have been demonstrated.16, With a handful of exceptions, research on programming languages and compiler design for programmable LoCs has lagged behind their silicon counterparts. To address this need, this paper presents a domain-specific programming language (DSL) and type system for a specific class of pLoC that manipulate discrete droplets of liquid on a two-dimensional grid. The basic principles of the language and type system readily generalize to programmable LoCs, realized across a wide variety of microfluidic technologies.


Communications of the ACM

On Feb 15, 2019, John Abowd, chief scientist at the U.S. Census Bureau, announced the results of a reconstruction attack that they proactively launched using data released under the 2010 Decennial Census.19 The decennial census released billions of statistics about individuals like "how many people of the age 10-20 live in New York City" or "how many people live in four-person households." Using only the data publicly released in 2010, an internal team was able to correctly reconstruct records of address (by census block), age, gender, race, and ethnicity for 142 million people (about 46% of the U.S. population), and correctly match these data to commercial datasets circa 2010 to associate personal-identifying information such as names for 52 million people (17% of the population). This is not specific to the U.S. Census Bureau--such attacks can occur in any setting where statistical information in the form of deidentified data, statistics, or even machine learning models are released. That such attacks are possible was predicted over 15 years ago by a seminal paper by Irit Dinur and Kobbi Nissim12--releasing a sufficiently large number of aggregate statistics with sufficiently high accuracy provides sufficient information to reconstruct the underlying database with high accuracy. The practicality of such a large-scale reconstruction by the U.S. Census Bureau underscores the grand challenge that public organizations, industry, and scientific research faces: How can we safely disseminate results of data analysis on sensitive databases? An emerging answer is differential privacy. An algorithm satisfies differential privacy (DP) if its output is insensitive to adding, removing or changing one record in its input database. DP is considered the "gold standard" for privacy for a number of reasons. It provides a persuasive mathematical proof of privacy to individuals with several rigorous interpretations.25,26 The DP guarantee is composable and repeating invocations of differentially private algorithms lead to a graceful degradation of privacy.

Keeping Science on Keel When Software Moves

Communications of the ACM

High performance computing (HPC) is central to solving large problems in science and engineering through the deployment of massive amounts of computational power. During this period, the core functionality of the software is made more efficient, new features are added, and the software is ported across multiple platforms. Porting of software in general involves the change of compilers, optimization levels, arithmetic libraries, and many other aspects that determine the machine instructions that actually get executed. Unfortunately, such changes do affect the computed results to a significant (and often worrisome) extent. In a majority of cases, there are not easily definable a priori answers one can check against. A programmer ends up comparing the new answer against a trusted baseline previously established or checks for indirect confirmations such as whether physical properties such as energy are conserved. However, such non-systematic efforts might miss underlying issues, and the code may keep misbehaving until these are fixed. In this article, we present real-world evidence to show that ignoring numerical result changes can lead to misleading scientific conclusions. We present techniques and tools that can help computational scientists understand and analyze compiler effects on their scientific code. These techniques are applicable across a wide range of examples to narrow down the root-causes to single files, functions within files, and even computational expressions that affect specific variables. The developer may then rewrite the code selectively and/or suppress the application of certain optimizations to regain more predictable behavior.

Differential Privacy

Communications of the ACM

Over the past decade, calls for better measures to protect sensitive, personally identifiable information have blossomed into what politicians like to call a "hot-button issue." Certainly, privacy violations have become rampant and people have grown keenly aware of just how vulnerable they are. When it comes to potential remedies, however, proposals have varied widely, leading to bitter, politically charged arguments. To date, what has chiefly come of that have been bureaucratic policies that satisfy almost no one--and infuriate many. Now, into this muddled picture comes differential privacy. First formalized in 2006, it's an approach based on a mathematically rigorous definition of privacy that allows formalization and proof of the guarantees against re-identification offered by a system. While differential privacy has been accepted by theorists for some time, its implementation has turned out to be subtle and tricky, with practical applications only now starting to become available. To date, differential privacy has been adopted by the U.S. Census Bureau, along with a number of technology companies, but what this means and how these organizations have implemented their systems remains a mystery to many. It's also unlikely that the emergence of differential privacy signals an end to all the difficult decisions and trade-offs, but it does signify that there now are measures of privacy that can be quantified and reasoned about--and then used to apply suitable privacy protections. A milestone in the effort to make this capability generally available came in September 2019 when Google released an open source version of the differential privacy library that the company has used with many of its core products. In the exchange that follows, two of the people at Google who were central to the effort to release the library as open source--Damien Desfontaines, privacy software engineer; and Miguel Guevara, who leads Google's differential privacy product development effort--reflect on the engineering challenges that lie ahead, as well as what remains to be done to achieve their ultimate goal of providing privacy protection by default.

The Time I Stole $10,000 from Bell Labs

Communications of the ACM

If IT workers fear they will be punished for outages, they will adopt behavior that leads to even larger outages. Instead, we should celebrate our outages: Document them blamelessly, discuss what we've learned from them openly, and spread that knowledge generously. An outage is not an expense. It is an investment in the people who have learned from it. We can maximize that investment through management practices that maximize learning for those involved and by spreading that knowledge across the organization.

Technological Responses to COVID-19

Communications of the ACM

Pratt Miller demonstrated its LAAD disinfecting robot at Gerald R Ford International Airport in Grand Rapids, MI, in July 2020. The impacts of the COVID-19 pandemic are likely to be felt for years to come, regardless of the presence and availability of a vaccine. Physical measures adopted by humans, such as social distancing or wearing masks, are likely to be utilized for years to come, along with technological developments deployed in both public and private spaces that are focused on enforcing social distancing, enabling more efficient cleaning and disinfecting of spaces, and driving more automation and intelligence to reduce humans' direct physical interaction with each other. Some companies and individuals feel the best way to avoid COVID-19 or other viruses is to simply avoid all unnecessary human contact. As such, many companies have introduced or fast-tracked the use of automation to lessen their reliance on human workers, as well as to enhance their responsiveness to customer queries.

Technical Perspective: Solving the Signal Reconstruction Problem at Scale

Communications of the ACM

When problems are scaled to "big data," researchers must often come up with new solutions, leveraging ideas from multiple research areas--as we frequently witness in today's big data techniques and tools for machine learning, bioinformatics, and data visualization. Beyond these heavily studied topics, there exist other classes of general problems that must be rethought at scale. One such problem is that of large-scale signal reconstruction:4 taking a set of observations of relatively low dimensionality, and using them to reconstruct a high-dimensional, unknown signal. This class of problems arises when we can only observe a subset of a complex environment that we are seeking to model--for instance, placing a few sensors and using their readings to reconstruct an environment's temperature, or monitoring multiple points in a network and using the readings to estimate end-to-end network traffic, or using 2D slices to reconstruct a 3D image. The following paper is notable because it scalably addresses an underserved problem with practical impact, and does so in a clean, insightful, and systematic way. This signal reconstruction problem (SRP) is typically approached as an optimization task, in which we search for the high-dimensional signal that minimizes a loss function comparing it to the known properties of the signal.