Guiding Computers, Robots to See and Think

Communications of the ACM

Though Stanford University professor Fei-Fei Li began her career during the most recent artificial intelligence (AI) winter, she's responsible for one of the insights that helped precipitate its thaw. By creating Image-Net, a hierarchically organized image database with more than 15 million images, she demonstrated the importance of rich datasets in developing algorithms--and launched the competition that eventually brought widespread attention to Geoffrey Hinton, Ilya Sutskever, and Alex Krizhevsky's work on deep convolutional neural networks. Today Li, who was recently named an ACM Fellow, directs the Stanford Artificial Intelligence Lab and the Stanford Vision and Learning Lab, where she works to build smart algorithms that enable computers and robots to see and think. Here, she talks about computer vision, neuroscience, and bringing more diversity to the field. Your bachelor's degree is in physics and your Ph.D. is in electrical engineering.

Predicting Program Properties from 'Big Code'

Communications of the ACM

We present a new approach for predicting program properties from large codebases (aka "Big Code"). Our approach learns a probabilistic model from "Big Code" and uses this model to predict properties of new, unseen programs. The key idea of our work is to transform the program into a representation that allows us to formulate the problem of inferring program properties as structured prediction in machine learning. This enables us to leverage powerful probabilistic models such as Conditional Random Fields (CRFs) and perform joint prediction of program properties. As an example of our approach, we built a scalable prediction engine called JSNICE for solving two kinds of tasks in the context of JavaScript: predicting (syntactic) names of identifiers and predicting (semantic) type annotations of variables. Experimentally, JSNICE predicts correct names for 63% of name identifiers and its type annotation predictions are correct in 81% of cases. Since its public release at, JSNice has become a popular system with hundreds of thousands of uses. By formulating the problem of inferring program properties as structured prediction, our work opens up the possibility for a range of new "Big Code" applications such as de-obfuscators, decompilers, invariant generators, and others. Recent years have seen significant progress in the area of programming languages driven by advances in type systems, constraint solving, program analysis, and synthesis techniques. Fundamentally, these methods reason about each program in isolation and while powerful, the effectiveness of programming tools based on these techniques is approaching its inherent limits. Thus, a more disruptive change is needed if a significant improvement is to take place. At the same time, creating probabilistic models from large datasets (also called "Big Data") has transformed a number of areas such as natural language processing, computer vision, recommendation systems, and many others. However, despite the overwhelming success of "Big Data" in a variety of application domains, learning from large datasets of programs has previously not had tangible impact on programming tools.

Beyond Worst-Case Analysis

Communications of the ACM

Comparing different algorithms is hard. For almost any pair of algorithms and measure of algorithm performance like running time or solution quality, each algorithm will perform better than the other on some inputs.a For example, the insertion sort algorithm is faster than merge sort on already-sorted arrays but slower on many other inputs. When two algorithms have incomparable performance, how can we deem one of them "better than" the other? Worst-case analysis is a specific modeling choice in the analysis of algorithms, where the overall performance of an algorithm is summarized by its worst performance on any input of a given size. The "better" algorithm is then the one with superior worst-case performance. Merge sort, with its worst-case asymptotic running time of Θ(n log n) for arrays of length n, is better in this sense than insertion sort, which has a worst-case running time of Θ(n2). While crude, worst-case analysis can be tremendously useful, and it is the dominant paradigm for algorithm analysis in theoretical computer science. A good worst-case guarantee is the best-case scenario for an algorithm, certifying its general-purpose utility and absolving its users from understanding which inputs are relevant to their applications. Remarkably, for many fundamental computational problems, there are algorithms with excellent worst-case performance guarantees. The lion's share of an undergraduate algorithms course comprises algorithms that run in linear or near-linear time in the worst case. Here, I review three classical examples where worst-case analysis gives misleading or useless advice about how to solve a problem; further examples in modern machine learning are described later.

Metamorphic Testing of Driverless Cars

Communications of the ACM

On March 18, 2018, Elaine Herzberg became the first pedestrian in the world to be killed by an autonomous vehicle after being hit by a self-driving Uber SUV in Tempe, AZ, at about 10 p.m. Video released by the local police department showed the self-driving Volvo XC90 did not appear to see Herzberg, as it did not slow down or alter course, even though she was visible in front of the vehicle prior to impact. Subsequently, automotive engineering experts raised questions about Uber's LiDAR technology.12 LiDAR, or "light detection and ranging," uses pulsed laser light to enable a self-driving car to see its surroundings hundreds of feet away. Velodyne, the supplier of the Uber vehicle's LiDAR technology, said, "Our LiDAR is capable of clearly imaging Elaine and her bicycle in this situation. However, our LiDAR does not make the decision to put on the brakes or get out of her way" ... "We know absolutely nothing about the engineering of their [Uber's] part ... It is a proprietary secret, and all of our customers keep this part to themselves"15 ... and "Our LiDAR can see perfectly well in the dark, as well as it sees in daylight, producing millions of points of information. However, it is up to the rest of the system to interpret and use the data to make decisions. We do not know how the Uber system of decision making works."11

The Seven Tools of Causal Inference, with Reflections on Machine Learning

Communications of the ACM

The dramatic success in machine learning has led to an explosion of artificial intelligence (AI) applications and increasing expectations for autonomous systems that exhibit human-level intelligence. These expectations have, however, met with fundamental obstacles that cut across many application areas. One such obstacle is adaptability, or robustness. Machine learning researchers have noted current systems lack the ability to recognize or react to new circumstances they have not been specifically programmed or trained for. Intensive theoretical and experimental efforts toward "transfer learning," "domain adaptation," and "lifelong learning"4 are reflective of this obstacle. Another obstacle is "explainability," or that "machine learning models remain mostly black boxes"26 unable to explain the reasons behind their predictions or recommendations, thus eroding users' trust and impeding diagnosis and repair; see Hutson8 and Marcus.11 A third obstacle concerns the lack of understanding of cause-effect connections.

Blogging Birds

Communications of the ACM

Blogging birds is a novel artificial intelligence program that generates creative texts to communicate telemetric data derived from satellite tags fitted to red kites -- a medium-size bird of prey -- as part of a species reintroduction program in the U.K. We address the challenge of communicating telemetric sensor data in real time by enriching it with meteorological and cartographic data, codifying ecological knowledge to allow creative interpretation of the behavior of individual birds in respect to such enriched data, and dynamically generating informative and engaging data-driven blogs aimed at the general public. Geospatial data is ubiquitous in today's world, with vast quantities of telemetric data collected by GPS receivers on, for example, smartphones and automotive black boxes. Adoption of telemetry has been particularly striking in the ecological realm, where the widespread use of satellite tags has greatly advanced our understanding of the natural world.14,23 Despite its increasing popularity, GPS telemetry involves the important shortcoming that both the handling and the interpretation of often large amounts of location data is time consuming and thus done mostly long after the data has been gathered.10,24 This hampers fruitful use of the data in nature conservation where immediate data analysis and interpretation are needed to take action or communicate to a wider audience.25,26 The widespread availability of GPS data, along with associated difficulties interpreting and communicating it in real time, mirrors the scenario seen with other forms of numeric or structured data. It should be noted that the use of computational methods for data analysis per se is hardly new; much of science depends on statistical analysis and associated visualization tools. However, it is generally understood that such tools are mediated by human operators who take responsibility for identifying patterns in data, as well as communicating them accurately.

Understanding Database Reconstruction Attacks on Public Data

Communications of the ACM

There exists a solution universe of all the possible solutions to this set of constraints. If the solution universe contains a single possible solution, then the published statistics completely reveal the underlying confidential data--provided that noise was not added to either the microdata or the tabulations as a disclosure-avoidance mechanism. If there are multiple satisfying solutions, then any element (person) in common among all of the solutions is revealed. If the equations have no solution, either the set of published statistics is inconsistent with the fictional statistical agency's claim that it is tabulated from a real confidential database or an error was made in that tabulation. This doesn't mean that a high-quality reconstruction is not possible.

Technical Perspective: Borrowing Big Code to Automate Programming Activities

Communications of the ACM

Big data combined with machine learning has revolutionized fields such as computer vision, robotics, and natural language processing. In these fields, automated techniques that detect and exploit complex patterns hidden within large datasets have repeatedly outperformed techniques based on human insight and intuition. But despite the availability of enormous amounts of code (big code) that could, in theory, be leveraged to deliver similar advances for software, programming has proved to be remarkably resistant to this kind of automation. Much programming today consists of developers deploying keyword searches against online information aggregators such as Stack Overflow to find, then manually adapt, code sequences that implement desired behaviors. The following paper presents new techniques for leveraging big code to automate two programming activities: selecting understandable names for JavaScript identifiers and generating type annotations for JavaScript variables.

Exoskeletons Today

Communications of the ACM

The EksoVest supports the wearer's arms during lifting. Millions of people Suffer from the effects of spinal cord injuries and strokes that have left them paralyzed. Millions more suffer from back pain, which makes movement painful. Exoskeletons are helping the paralyzed to walk again, enabling soldiers to carry heavy loads, and workers to lift heavy objects with greater ease. An exoskeleton is a mechanical device or soft material worn by a patient/operator, whose structure mirrors the skeletal structure of the operator's limbs (joints, muscles, etc.).

Keyhole threshold and morphology in laser melting revealed by ultrahigh-speed x-ray imaging


We used ultrahigh-speed synchrotron x-ray imaging to quantify the phenomenon of vapor depressions (also known as keyholes) during laser melting of metals as practiced in additive manufacturing. Although expected from welding and inferred from postmortem cross sections of fusion zones, the direct visualization of the keyhole morphology and dynamics with high-energy x-rays shows that (i) keyholes are present across the range of power and scanning velocity used in laser powder bed fusion; (ii) there is a well-defined threshold from conduction mode to keyhole based on laser power density; and (iii) the transition follows the sequence of vaporization, depression of the liquid surface, instability, and then deep keyhole formation. These and other aspects provide a physical basis for three-dimensional printing in laser powder bed machines.