Communications of the ACM


It's All About Image

Communications of the ACM

Enter computer image recognition, artificial neural networks, and data science; together, they are changing the equation. In recent years, scientists have begun to train neural nets to analyze data from images captured by cameras in telescopes located on Earth and in space. Rapid advancements in neural nets and deep learning are a result of several factors, including faster and better GPUs, larger nets with deeper layers, huge labeled datasets to train on, new and different types of neural nets, and improved algorithms. Researchers are turning to convolutional systems modeled from human visual processing, and generative systems that rely on a statistical approach.


fulltext

Communications of the ACM

In 2014, the Neural Information Processing Systems Foundation (NIPS) Conference split the program committee into two independent committees, and then subjected 10% of the submissions--166 papers--to decision making by both committees. Given the NIPS paper acceptance rate of 25%, this means that close to 60% of the papers accepted by the first committee were rejected by the second one and vice versa. Beyond the increased fairness of "innocent until proven guilty," this approach would also increase the efficiency of the conference-publication system. A high rejection rate means that papers are submitted, resubmitted, and re-resubmitted, resulting in a very high reviewing burden on the community.


Data Sketching

Communications of the ACM

Yet, the scale of events occurring is huge: many millions of network events per hour, per network element. With standard statistical results, for questions like those in the customer records example, the standard error of a sample of size s is proportional to 1/ s. Roughly speaking, this means that in estimating a proportion from the sample, the error would be expected to look like 1/ s. Therefore, looking at the voting intention of a subset of 1,000 voters produces an opinion poll whose error is approximately 3%--providing high confidence (but not certainty) that the true answer is within 3% of the result on the sample, assuming the sample was drawn randomly and the participants responded honestly. A common trick is to attach a random number to each record, then sort the data based on this random tag and take the first s records in the sorted order. One limitation is that the attribute of interest must be specified in advance of setting up the sketch, while a sample allows you to evaluate a query for any recorded attribute of the sampled items.


The Calculus of Service Availability

Communications of the ACM

As detailed in Site Reliability Engineering: How Google Runs Production Systems1 (hereafter referred to as the SRE book), Google products and services seek high-velocity feature development while maintaining aggressive service-level objectives (SLOs) for availability and responsiveness. Internally at Google, we use the following rule of thumb: critical dependencies must offer one additional 9 relative to your service--in the example case, 99.999% availability--because any service will have several critical dependencies, as well as its own idiosyncratic problems. In such a model, as shown in Figure 1, there are 10 unique first-order dependencies, 100 unique second-order dependencies, 1,000 unique third-order dependencies, and so on, leading to a total of 1,111 unique services even if the architecture is limited to four layers. An error budget is simply 1 minus a service's SLO, so the previously discussed 99.99% available service has a 0.01% "budget" for unavailability.


fulltext

Communications of the ACM

On the other hand, some HPC systems run highly exotic hardware and software stacks. This fact means that aside from all of the normal reasons that any network-connected computer might be attacked, HPC computers have their own distinct systems, resources, and assets that an attacker might target, as well as their own distinctive attributes that make securing such systems somewhat distinct from securing other types of computing systems. As a result, although I discuss confidentiality, a typical component of the "C-I-A" triad, because even in open science, data leakage is certainly an issue and a threat, this article focuses more on integrity related threats,31,32 including alteration of code or data, or misuse of computing cycles, and availability related threats, including disruption or denial of service against HPC systems or networks that connect them. The diagram at top shows a typical workflow for data analysis in HPC; the middle diagram shows a typical workflow for modeling and simulation; and the bottom diagram shows a coupled, interactive compute-visualization workflow.


Take Two Aspirin and Call Me in the Morning

Communications of the ACM

Security is much on my mind these days along with safety and privacy in an increasingly online, programmed world. Our reactions in the public health world involve inoculation and quarantine and we tolerate this because we recognize our health is at risk if other members of society fail to protect themselves from infection. Google acquired a company called Virustotalb a few years ago that maintains a library of viral profiles that allows users to check whether particular URLs or files carry malware. It is tempting to imagine a home router/firewall that does sophisticated, machine-learned observation to protect programmable devices at home, but since our laptops, mobiles, and other programmed devices roam with us, they really need an on-board detection system (or logging system?)


Trust and Distrust in Online Fact-Checking Services

Communications of the ACM

While the internet has the potential to give people ready access to relevant and factual information, social media sites like Facebook and Twitter have made filtering and assessing online content increasingly difficult due to its rapid flow and enormous volume. To explore how social media users perceive the trustworthiness and usefulness of these services, we applied a research approach designed to take advantage of unstructured social media conversations (see Figure 3). While investigations of trust and usefulness often rely on structured data from questionnaire-based surveys, social media conversations represent a highly relevant data source for our purpose, as they arguably reflect the raw, authentic perceptions of social media users. To create a sufficient dataset for analysis, we removed all duplicates, including a small number of non-relevant posts lacking personal opinions about fact checkers.


Moving Beyond the Turing Test with the Allen AI Science Challenge

Communications of the ACM

The competition aimed to assess the state of the art in AI systems utilizing natural language understanding and knowledge-based reasoning; how accurately the participants' models could answer the exam questions would serve as an indicator of how far the field has come in these areas. A week before the end of the competition, we provided the final test set of 21,298 questions (including the validation set) to participants to use to produce a final score for their models, of which 2,583 were legitimate. AI2 also generated a baseline score using a Lucene search over the Wikipedia corpus, producing scores of 40.2% on the training set and 40.7% on the final test set. His model achieved a final score of 59.31% correct on the test question set of 2,583 questions using a combination of 15 gradient-boosting models, each with a different subset of features.


Why GPS Spoofing Is a Threat to Companies, Countries

Communications of the ACM

In the bowels of the ship, Todd Humphreys, an associate professor in the Department of Aerospace Engineering and Engineering Mechanics at the University of Texas at Austin, worked with his team to feed the super-yacht's crew false navigation data using a few thousand dollars worth of hardware and software. The protocol, called the TESLA signature, is designed to complement location data with a cryptographic "signature," so Galileo's satellites would send both navigation data and the cryptographic signature to the receiving client. Humphreys also points to the U.S. Department of Homeland Security's recent document on anti-spoofing, "Improving the Operation and Development of Global Positioning System (GPS) Equipment Used by Critical Infrastructure," as a sign that the right parties are taking GPS spoofing seriously. U.S. Department of Homeland Security, National Cybersecurity & Communications Integration Center, National Coordinating Center for Communications Improving the Operation and Development of Global Positioning System (GPS) Equipment Used by Critical Infrastructure, http://bit.ly/2oZewfz Logan Kugler is a freelance technology writer based in Tampa, FL.


Turing Laureates Celebrate Award's 50th Anniversary

Communications of the ACM

Among the 22 Turing Laureates in attendance at the conference were: Front row, from left: Whitfield Diffie (2015), Martin Hellman (2015), Robert Tarjan (1986), Barbara Liskov (2008). Among the 22 Turing Laureates in attendance at the conference were: Front row, from left: Whitfield Diffie (2015), Martin Hellman (2015), Robert Tarjan (1986), Barbara Liskov (2008). Butler Lampson, the 1992 Turing Laureate ("for contributions to the development of distributed, personal computing environments and the technology for their implementation: workstations, networks, operating systems, programming systems, displays, security, and document publishing"), said, "There's plenty of room at the top; there's room in software, algorithms, and hardware." A panel on Moore's Law was moderated by John Hennessy (left) and included Doug Burger, Norman Jouppi, Butler Lampson (1992), and Margaret Martonosi.