Goto

Collaborating Authors

Learning New Things and Avoiding Obstacles

Communications of the ACM

ACM A.M. Turing Award recipient Jack Dongarra never intended to work with computers. Initially, the Distinguished Professor at the University of Tennessee and founder of the Innovative Computing Laboratory (ICL) thought he would be a high school science teacher. A chance internship at the Argonne National Laboratory kindled a lifelong interest in numerical methods and software--and, in particular, in linear algebra, which powered the development of Dongarra's groundbreaking techniques for optimizing operations on increasingly complex computer architectures. Your career in computing began serendipitously, with a semester-long internship at Argonne National Laboratory. As an undergraduate, I worked on EISPACK, a software package designed to solve eigenvalue problems.


Resolution of the Burrows-Wheeler Transform Conjecture

Communications of the ACM

The Burrows-Wheeler Transform (BWT) is an invertible text transformation that permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the main component of popular lossless compression programs (such as bzip2) as well as recent powerful compressed indexes (such as the r-index7), central in modern bioinformatics. The compressibility of BWT is quantified by the number r of equal-letter runs in the output. Despite the practical significance of BWT, no nontrivial upper bound on r is known. By contrast, the sizes of nearly all other known compression methods have been shown to be either always within a poly-log n factor (where n is the length of the text) from z, the size of Lempel–Ziv (LZ77) parsing of the text, or much larger in the worst case (by an nε factor for ε 0). In this paper, we show that r (z log2 n) holds for every text. This result has numerous implications for text indexing and data compression; in particular: (1) it proves that many results related to BWT automatically apply to methods based on LZ77, for example, it is possible to obtain functionality of the suffix tree in (z polylog n) space; (2) it shows that many text processing tasks can be solved in the optimal time assuming the text is compressible using LZ77 by a sufficiently large polylog n factor; and (3) it implies the first nontrivial relation between the number of runs in the BWT of the text and of its reverse. In addition, we provide an (z polylog n)-time algorithm converting the LZ77 parsing into the run-length compressed BWT. To achieve this, we develop several new data structures and techniques of independent interest. In particular, we define compressed string synchronizing sets (generalizing the recently introduced powerful technique of string synchronizing sets11) and show how to efficiently construct them. Next, we propose a new variant of wavelet trees for sequences of long strings, establish a nontrivial bound on their size, and describe efficient construction algorithms. Finally, we develop new indexes that can be constructed directly from the LZ77 parsing and efficiently support pattern matching queries on text substrings. Lossless data compression aims to exploit redundancy in the input data to represent it in a small space.


SoundWatch

Communications of the ACM

We present SoundWatch, a smartwatch-based deep learning application to sense, classify, and provide feedback about sounds occurring in the environment.


Challenges, Experiments, and Computational Solutions in Peer Review

Communications of the ACM

While researchers are trained to do research, there is little training for peer review. Several initiatives and experiments have looked to address this challenge. Recently, the ICML 2020 conference adopted a method to select and then mentor junior reviewers, who would not have been asked to review otherwise, with a motivation of expanding the reviewer pool to address the large volume of submissions.43 An analysis of their reviews revealed that the junior reviewers were more engaged through various stages of the process as compared to conventional reviewers. Moreover, the conference asked meta reviewers to rate all reviews, and 30% of reviews written by junior reviewers received the highest rating by meta reviewers, in contrast to 14% for the main pool. Training reviewers at the beginning of their careers is a good start but may not be enough. There is some evidence8 that quality of an individual's review falls over time, at a slow but steady rate, possibly because of increasing time constraints or in reaction to poor-quality reviews they themselves receive. While researchers are trained to do research, there is little training for peer review … Training reviewers at the beginning of their careers is a good start but may not be enough.


Responsible Data Management

Communications of the ACM

Incorporating ethics and legal compliance into data-driven algorithmic systems has been attracting significant attention from the computing research community, most notably under the umbrella of fair8 and interpretable16 machine learning. While important, much of this work has been limited in scope to the "last mile" of data analysis and has disregarded both the system's design, development, and use life cycle (What are we automating and why? Is the system working as intended? Are there any unforeseen consequences post-deployment?) and the data life cycle (Where did the data come from? How long is it valid and appropriate?). In this article, we argue two points. First, the decisions we make during data collection and preparation profoundly impact the robustness, fairness, and interpretability of the systems we build. Second, our responsibility for the operation of these systems does not stop when they are deployed. To make our discussion concrete, consider the use of predictive analytics in hiring. Automated hiring systems are seeing ever broader use and are as varied as the hiring practices themselves, ranging from resume screeners that claim to identify promising applicantsa to video and voice analysis tools that facilitate the interview processb and game-based assessments that promise to surface personality traits indicative of future success.c Bogen and Rieke5 describe the hiring process from the employer's point of view as a series of decisions that forms a funnel, with stages corresponding to sourcing, screening, interviewing, and selection. The hiring funnel is an example of an automated decision system--a data-driven, algorithm-assisted process that culminates in job offers to some candidates and rejections to others. The popularity of automated hiring systems is due in no small part to our collective quest for efficiency.


Methods Included

Communications of the ACM

Although workflows are very popular, prior to the CWL standards, all workflow systems were incompatible with each other. This means that users who do not use the CWL standards are required to express their computational workflows in a different way each time they use another workflow system, leading to local success but global unportability. The success of workflows is now their biggest drawback. Users are locked into a particular vendor, project, and often a specific hardware setup, hampering sharing and reuse. Even non-academics suffer from this situation, as the lack of standards, or their adoption, hinders effective collaboration on computational methods within and between companies.


The Planning and Care of Data

Communications of the ACM

After several years as a startup, we seem finally to have gotten large and popular enough that management and legal are looking at how we store and segregate our user data. To say this feels like a fire drill would be understating it. Everyone now seems to have an opinion on how we should handle user data. Meetings on this topic would be funny if they were not so tragic. It is not as if we are a huge company that claims billions of users, and, of course, it is important to protect our customers' data.


Always Improving Performance

Communications of the ACM

As a young man, Jack Dongarra thought he would probably teach science to high school students. That was his plan when he enrolled at Chicago State College, which had become Chicago State University by the time he graduated in 1972. Over the course of his studies, he began to be fascinated by computers. In his senior year, physics professor Harvey Leff suggested he apply for an internship at nearby Argonne National Laboratory, where he could gain some computing experience. There, Dongarra joined a group developing EISPACK, a software library for calculating eigenvalues, components of linear algebra that are important to performing simulations of chemistry and physics.


A Deeper Understanding of Deep Learning

Communications of the ACM

Deep learning should not work as well as it seems to: according to traditional statistics and machine learning, any analysis that has too many adjustable parameters will overfit noisy training data, and then fail when faced with novel test data. In clear violation of this principle, modern neural networks often use vastly more parameters than data points, but they nonetheless generalize to new data quite well. The shaky theoretical basis for generalization has been noted for many years. One proposal was that neural networks implicitly perform some sort of regularization--a statistical tool that penalizes the use of extra parameters. Yet efforts to formally characterize such an "implicit bias" toward smoother solutions have failed, said Roi Livni, an advanced lecturer in the department of electrical engineering of Israel's Tel Aviv University.


Addressing Labor Shortages with Automation

Communications of the ACM

U..S. employment statistics hit a new milestone last year, but not a positive one. In August 2021, almost 4.3 million workers quit their jobs, according to the U.S. Department of Labor. That's the highest number since the department began tracking voluntary resignations. Their reasons for leaving their jobs vary--the numbers track people who quit for a different position, as well as those who quit without having another job lined up. While the reasons for quitting vary, one thing is clear: Businesses are having a tough time getting employees to come back.