Government
I tried the new White House Facebook chatbot. Here's what happened.
Have you ever wished getting in touch with the president was as easy as IMing your friends? Well, you can now send President Obama private messages through the White House's Facebook page. The Facebook messages will be among those considered for the 10 personal letters the president reads each day. White House chief digital officer Jason Goldman pitched the new Facebook messaging feature Wednesday as a way for the administration to "meet people where they are" online. The new feature relies on a chatbot that simulates a conversation but doesn't directly respond to questions.
Verifying Quantitative Reliability for Programs that Execute on Unreliable Hardware
Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approximate computing applications (such as multimedia processing, machine learning, and big data analytics) can often naturally tolerate soft errors. We present Rely, a programming language that enables developers to reason about the quantitative reliability of an application--namely, the probability that it produces the correct result when executed on unreliable hardware. Rely allows developers to specify the reliability requirements for each value that a function produces. We present a static quantitative reliability analysis that verifies quantitative requirements on the reliability of an application, enabling a developer to perform sound and verified reliability engineering. The analysis takes a Rely program with a reliability specification and a hardware specification that characterizes the reliability of the underlying hardware components and verifies that the program satisfies its reliability specification when executed on the underlying unreliable hardware platform. We demonstrate the application of quantitative reliability analysis on six computations implemented in Rely. Reliability is a major concern in the design of computer systems. The current goal of delivering systems with negligible error rates restricts the available design space and imposes significant engineering costs.
Politicians Need to Understand This Computer Science Concept Better - Facts So Romantic
I have an idea that would keep 100 percent of foreign-born terrorists out of the United States. All we have to do is this: Never let anybody in. Most of us find this idea ludicrous, of course, and rightly so. Keeping out terrorists is not the only goal of border policy; it's essential that the vast majority of people can come and go freely, whether for pleasure, business, or survival. Yet many of our decisions are based on similarly shoddy reasoning: We often fail to consider that there are two sides to the accuracy seesaw. For example, we celebrate mammograms for detecting 84 percent of breast cancers and bemoan when law enforcement can't access a phone, but we overlook how often mammograms detect fictitious cancers, or how many hackers were foiled by encryption.
Viewpoint and Topic Modeling of Current Events
Zhang, Kerry, Karlgren, Jussi, Zhang, Cheng, Lagergren, Jens
There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.
Dynamic Principal Component Analysis: Identifying the Relationship between Multiple Air Pollutants
Melnikov, Oleg, Raun, Loren H., Ensor, Katherine B.
The dynamic nature of air quality chemistry and transport makes it difficult to identify the mixture of air pollutants for a region. In this study of air quality in the Houston metropolitan area we apply dynamic principal component analysis (DPCA) to a normalized multivariate time series of daily concentration measurements of five pollutants (O3, CO, NO2, SO2, PM2.5) from January 1, 2009 through December 31, 2011 for each of the 24 hours in a day. The resulting dynamic components are examined by hour across days for the 3 year period. Diurnal and seasonal patterns are revealed underlining times when DPCA performs best and two principal components (PCs) explain most variability in the multivariate series. DPCA is shown to be superior to static principal component analysis (PCA) in discovery of linear relations among transformed pollutant measurements. DPCA captures the time-dependent correlation structure of the underlying pollutants recorded at up to 34 monitoring sites in the region. In winter mornings the first principal component (PC1) (mainly CO and NO2) explains up to 70% of variability. Augmenting with the second principal component (PC2) (mainly driven by SO2) the explained variability rises to 90%. In the afternoon, O3 gains prominence in the second principal component. The seasonal profile of PCs' contribution to variance loses its distinction in the afternoon, yet cumulatively PC1 and PC2 still explain up to 65% of variability in ambient air data. DPCA provides a strategy for identifying the changing air quality profile for the region studied.
Linear Regression with an Unknown Permutation: Statistical and Computational Limits
Pananjady, Ashwin, Wainwright, Martin J., Courtade, Thomas A.
Consider a noisy linear observation model with an unknown permutation, based on observing $y = \Pi^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $\Pi^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. from a standard Gaussian distribution, and establish sharp conditions on the SNR, sample size $n$, and dimension $d$ under which $\Pi^*$ is exactly and approximately recoverable. On the computational front, we show that the maximum likelihood estimate of $\Pi^*$ is NP-hard to compute, while also providing a polynomial time algorithm when $d =1$.
Efficient Dodgson-Score Calculation Using Heuristics and Parallel Computing
Recknagel, Arne, Besold, Tarek R.
Conflict of interest is the permanent companion of any population of agents (computational or biological). For that reason, the ability to compromise is of paramount importance, making voting a key element of societal mechanisms. One of the voting procedures most often discussed in the literature and, due to its intuitiveness, also conceptually quite appealing is Charles Dodgson's scoring rule, basically using the respective closeness to being a Condorcet winner for evaluating competing alternatives. In this paper, we offer insights on the practical limits of algorithms computing the exact Dodgson scores from a number of votes. While the problem itself is theoretically intractable, this work proposes and analyses five different solutions which try distinct approaches to practically solve the issue in an effective manner. Additionally, three of the discussed procedures can be run in parallel which has the potential of drastically reducing the problem size.
A Study of Proxies for Shapley Allocations of Transport Costs
Aziz, Haris, Cahan, Casey, Gretton, Charles, Kilby, Philip, Mattei, Nicholas, Walsh, Toby
We survey existing rules of thumb, propose novel methods, and comprehensively evaluate a number of solutions to the problem of calculating the cost to serve each location in a single-vehicle transport setting. Cost to serve analysis has applications both strategically and operationally in transportation settings. The problem is formally modeled as the traveling salesperson game (TSG), a cooperative transferable utility game in which agents correspond to locations in a traveling salesperson problem (TSP). The total cost to serve all locations in the TSP is the length of an optimal tour. An allocation divides the total cost among individual locations, thus providing the cost to serve each of them. As one of the most important normative division schemes in cooperative games, the Shapley value gives a principled and fair allocation for a broad variety of games including the TSG. We consider a number of direct and sampling-based procedures for calculating the Shapley value, and prove that approximating the Shapley value of the TSG within a constant factor is NP-hard. Treating the Shapley value as an ideal baseline allocation, we survey six proxies for it that are each relatively easy to compute. Some of these proxies are rules of thumb and some are procedures international delivery companies use(d) as cost allocation methods. We perform an experimental evaluation using synthetic Euclidean games as well as games derived from real-world tours calculated for scenarios involving fast-moving goods; where deliveries are made on a road network every day. We explore several computationally tractable allocation techniques that are good proxies for the Shapley value in problem instances of a size and complexity that is commercially relevant.
Community Detection in Political Twitter Networks using Nonnegative Matrix Factorization Methods
Ozer, Mert, Kim, Nyunsu, Davulcu, Hasan
Community detection is a fundamental task in social network analysis. In this paper, first we develop an endorsement filtered user connectivity network by utilizing Heider's structural balance theory and certain Twitter triad patterns. Next, we develop three Nonnegative Matrix Factorization frameworks to investigate the contributions of different types of user connectivity and content information in community detection. We show that user content and endorsement filtered connectivity information are complementary to each other in clustering politically motivated users into pure political communities. Word usage is the strongest indicator of users' political orientation among all content categories. Incorporating user-word matrix and word similarity regularizer provides the missing link in connectivity only methods which suffer from detection of artificially large number of clusters for Twitter networks.
Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance
Helfer, Brian S., Williamson, James R., Miller, Benjamin A., Perricone, Joseph, Quatieri, Thomas F.
Studies in recent years have demonstrated that neural organization and structure impact an individual's ability to perform a given task. Specifically, individuals with greater neural efficiency have been shown to outperform those with less organized functional structure. In this work, we compare the predictive ability of properties of neural connectivity on a working memory task. We provide two novel approaches for characterizing functional network connectivity from electroencephalography (EEG), and compare these features to the average power across frequency bands in EEG channels. Our first novel approach represents functional connectivity structure through the distribution of eigenvalues making up channel coherence matrices in multiple frequency bands. Our second approach creates a connectivity network at each frequency band, and assesses variability in average path lengths of connected components and degree across the network. Failures in digit and sentence recall on single trials are detected using a Gaussian classifier for each feature set, at each frequency band. The classifier results are then fused across frequency bands, with the resulting detection performance summarized using the area under the receiver operating characteristic curve (AUC) statistic.