Communications of the ACM


Technical Perspective: Shedding New Light on an Old Language Debate

Communications of the ACM

The following paper aims to bring empiricism to this debate by studying whether programming language choice and code quality are related. To do so, the authors perform an observational study on a corpus of 728 popular GitHub projects, totaling 63 million lines of code. Second, they report that languages that are functional, disallow implicit type conversion, have static typing, and/or use-managed memory have slightly fewer defects than languages without these characteristics. Like any empirical study, the results here have threats to validity: noise in the data, such as the classification of a commit as defect-fixing, is difficult to account for; defects may have been made and fixed without an intervening commit, for example, defects prevented by a static type checker are likely not included; projects vary significantly in software engineering practices, for example, Linux is an outlier, with an extremely large user base with many developers and testers; tool support for different languages varies significantly; there may be a strong relationship between programmer skill and language choice; language design can obviate classes of errors, for example, buffer overflows can occur in C and C but not Java; and in practice the choice of programming language is often constrained both by external factors (for example, the language of existing codebases) and the problem domain (for example, device drivers are likely to be written in C or C).


A Large-Scale Study of Programming Languages and Code Quality in GitHub

Communications of the ACM

In this study, we gather a very large data set from GitHub (728 projects, 63 million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static versus dynamic typing and allowing versus disallowing type confusion on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. We also calculate other project-related statistics, including maximum commit age of a project and the total number of developers, used as control variables in our regression model, and discussed in Section 3.


Multi-Objective Parametric Query Optimization

Communications of the ACM

We propose a generalization of the classical database query optimization problem: multi-objective parametric query (MPQ) optimization. MPQ generalizes previously proposed query optimization variants, such as multi-objective query optimization, parametric query optimization, and traditional query optimization. The goal of the database query optimization is to map a query (describing the data to generate) to the optimal query plan (describing how to generate the data). Alternative query plans are compared according to their execution cost (e.g., execution time) in query optimization.


Barriers to Refactoring

Communications of the ACM

We asked them to indicate what they thought was a good design, focusing on a reasonable level of class depth and size. We previously reported results for part of our survey indicating that 12.0% (452) of participants had a preference for a limit on the number of methods, whereas 25.2% (952) indicated a preference for a limit on the depth of a class; for details see Gorschek et al.7 These results suggest a significant proportion of developers believe the theory. We first asked them to indicate what they believed was a good threshold for size--WMC--and depth--DIT. The responses of greatest interest to us were from participants who indicated strong agreement with design principles that limit the number of methods or depth of a class.


The Real Risks of Artificial Intelligence

Communications of the ACM

He described a game (the imitation game) in which a human and a machine would answer questions and observers would attempt to use those answers to identify the machine. Anyone interested in the Turing Test should study the work of the late MIT professor Joseph Weizenbaum.3 In the mid-1960s, he created Eliza, a program that imitated a practitioner of Rogerian psychotherapy. Some believed Weizenbaum was seriously attempting to create intelligence by creating a program that could pass Turing's test. Early AI experts taught us to design character recognition programs by interviewing human readers.


Digital Hearing

Communications of the ACM

The Earlens Light-Driven Hearing Aid converts sounds into pulses of light, which activate a lens on the eardrum. Researchers and companies continue to advance the technology by understanding how listeners process a complex "auditory scene," which requires more than just amplifying sounds. The Earlens Light-Driven Hearing Aid converts sounds into pulses of light, which activate a lens on the eardrum. Although increased processing power has clearly benefited hearing aid technology, designs must extend beyond electrical engineering to encompass the complex and idiosyncratic ways that people process and interpret sounds.


Manipulating Word Representations, and Preparing Students for Coding Jobs?

Communications of the ACM

The network, through iterated adjustment of the elements of the vector based on errors detected on comparison with the text corpora, produces the values in continuous space that best reflect the contextual data given. Most dictionaries will offer a direct or indirect connection through "king" to "ruler" or "sovereign" and "male" and through "queen" to "ruler" or "sovereign" and "female," as: These definitions2 show gender can be "factored out," and in common usage the gender aspect of sovereigns is notable. As we understand the high degree of contextual dependency of word meanings in a language, any representation of word meaning to a significant degree will reflect context, where context is its interassociation with other words. The word vectors produced by the method of training on a huge natural text dataset, in which words are given distributed vector representations refined through associations present in the input context, reflect the cross-referential semantic compositionality of a dictionary.


Beyond Brute Force

Communications of the ACM

While applying simplistic approaches to complex domains (such as image and speech processing) is inefficient, certain specific computational problems do indeed benefit from brute force. In this regard, the mathematician who uses brute force is simply functioning as a good engineer intent on solving problems efficiently. My primary focus during my Massachusetts General Hospital fellowship (2013–2016) was analyzing the electronic health records for 314,292 patients.2 To identify biomarkers associated with outcomes, my colleagues and I were initially interested in knowing the smoking status of all of them--current, past, or never--for our prediction models. Smoking status is typically documented in clinical narrative notes as free text, and, as reported throughout the literature, classification accuracy of current methods is poor. I hypothesized that following a simple human-in-the-loop brute-force approach designed to semi-manually extract non-negated expressions could achieve better ...


Computing Is a Profession

Communications of the ACM

The compounding of this continued and accelerating advance give rise to a deep technical expertise. While deep technical challenges abound, the ethical challenges, principles, and standards are even more daunting. Second, societies develop and advocate principles for ethical technical conduct that frame the role of computing professionals, and buttress them with the stature and role of the profession in society. Necessarily so, as technical knowledge and professional ethics must inform professional conduct, and inevitably come into conflict with personal interest, corporate interest, government or national interest, or even overt coercion.


It's All About Image

Communications of the ACM

Enter computer image recognition, artificial neural networks, and data science; together, they are changing the equation. In recent years, scientists have begun to train neural nets to analyze data from images captured by cameras in telescopes located on Earth and in space. Rapid advancements in neural nets and deep learning are a result of several factors, including faster and better GPUs, larger nets with deeper layers, huge labeled datasets to train on, new and different types of neural nets, and improved algorithms. Researchers are turning to convolutional systems modeled from human visual processing, and generative systems that rely on a statistical approach.