Government
Estimating Vector Fields on Manifolds and the Embedding of Directed Graphs
Perrault-Joncas, Dominique, Meila, Marina
This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model a directed graph as a finite set of observations from a diffusion on a manifold endowed with a vector field. This is the first generative model of its kind for directed graphs. We introduce a graph embedding algorithm that estimates all three features of this model: the low-dimensional embedding of the manifold, the data density and the vector field. In the process, we also obtain new theoretical results on the limits of "Laplacian type" matrices derived from directed graphs. The application of our method to both artificially constructed and real data highlights its strengths.
How Hard Is It to Control an Election by Breaking Ties?
Mattei, Nicholas, Narodytska, Nina, Walsh, Toby
We study the computational complexity of controlling the result of an election by breaking ties strategically. This problem is equivalent to the problem of deciding the winner of an election under parallel universes tie-breaking. When the chair of the election is only asked to break ties to choose between one of the co-winners, the problem is trivially easy. However, in multi-round elections, we prove that it can be NP-hard for the chair to compute how to break ties to ensure a given result. Additionally, we show that the form of the tie-breaking function can increase the opportunities for control. Indeed, we prove that it can be NP-hard to control an election by breaking ties even with a two-stage voting rule.
The Computational Impact of Partial Votes on Strategic Voting
In many real world elections, agents are not required to rank all candidates. We study three of the most common methods used to modify voting rules to deal with such partial votes. These methods modify scoring rules (like the Borda count), elimination style rules (like single transferable vote) and rules based on the tournament graph (like Copeland) respectively. We argue that with an elimination style voting rule like single transferable vote, partial voting does not change the situations where strategic voting is possible. However, with scoring rules and rules based on the tournament graph, partial voting can increase the situations where strategic voting is possible. As a consequence, the computational complexity of computing a strategic vote can change. For example, with Borda count, the complexity of computing a strategic vote can decrease or stay the same depending on how we score partial votes.
A Decision-Theoretic Model of Assistance
Fern, A., Natarajan, S., Judah, K., Tadepalli, P.
There is a growing interest in intelligent assistants for a variety of applications from sorting email to helping people with disabilities to do their daily chores. In this paper, we formulate the problem of intelligent assistance in a decision-theoretic framework, and present both theoretical and empirical results. We first introduce a class of POMDPs called hidden-goal MDPs (HGMDPs), which formalizes the problem of interactively assisting an agent whose goal is hidden and whose actions are observable. In spite of its restricted nature, we show that optimal action selection for HGMDPs is PSPACE-complete even for deterministic dynamics. We then introduce a more restricted model called helper action MDPs (HAMDPs), which are sufficient for modeling many real-world problems. We show classes of HAMDPs for which efficient algorithms are possible. More interestingly, for general HAMDPs we show that a simple myopic policy achieves a near optimal regret, compared to an oracle assistant that knows the agent's goal. We then introduce more sophisticated versions of this policy for the general case of HGMDPs that we combine with a novel approach for quickly learning about the agent being assisted. We evaluate our approach in two game-like computer environments where human subjects perform tasks, and in a real-world domain of providing assistance during folder navigation in a computer desktop environment. The results show that in all three domains the framework results in an assistant that substantially reduces user effort with only modest computation.
Randomized Approximation of the Gram Matrix: Exact Computation and Probabilistic Bounds
Holodnak, John T., Ipsen, Ilse C. F.
Given a real matrix A with n columns, the problem is to approximate the Gram product AA^T by c << n weighted outer products of columns of A. Necessary and sufficient conditions for the exact computation of AA^T (in exact arithmetic) from c >= rank(A) columns depend on the right singular vector matrix of A. For a Monte-Carlo matrix multiplication algorithm by Drineas et al. that samples outer products, we present probabilistic bounds for the 2-norm relative error due to randomization. The bounds depend on the stable rank or the rank of A, but not on the matrix dimensions. Numerical experiments illustrate that the bounds are informative, even for stringent success probabilities and matrices of small dimension. We also derive bounds for the smallest singular value and the condition number of matrices obtained by sampling rows from orthonormal matrices.
Minimum Model Semantics for Extensional Higher-order Logic Programming with Negation
Charalambidis, Angelos, รsik, Zoltรกn, Rondogiannis, Panos
Extensional higher-order logic programming has been introduced as a generalization of classical logic programming. An important characteristic of this paradigm is that it preserves all the well-known properties of traditional logic programming. In this paper we consider the semantics of negation in the context of the new paradigm. Using some recent results from non-monotonic fixed-point theory, we demonstrate that every higher-order logic program with negation has a unique minimum infinite-valued model. In this way we obtain the first purely model-theoretic semantics for negation in extensional higher-order logic programming. Using our approach, we resolve an old paradox that was introduced by W. W. Wadge in order to demonstrate the semantic difficulties of higher-order logic programming.
Topic words analysis based on LDA model
Social network analysis (SNA), which is a research field describing and modeling the social connection of a certain group of people, is popular among network services. Our topic words analysis project is a SNA method to visualize the topic words among emails from Obama.com to accounts registered in Columbus, Ohio. Based on Latent Dirichlet Allocation (LDA) model, a popular topic model of SNA, our project characterizes the preference of senders for target group of receptors. Gibbs sampling is used to estimate topic and word distribution. Our training and testing data are emails from the carbon-free server Datagreening.com. We use parallel computing tool BashReduce for word processing and generate related words under each latent topic to discovers typical information of political news sending specially to local Columbus receptors. Running on two instances using paralleling tool BashReduce, our project contributes almost 30% speedup processing the raw contents, comparing with processing contents on one instance locally. Also, the experimental result shows that the LDA model applied in our project provides precision rate 53.96% higher than TF-IDF model finding target words, on the condition that appropriate size of topic words list is selected.
Nested Hierarchical Dirichlet Processes
Paisley, John, Wang, Chong, Blei, David M., Jordan, Michael I.
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.
Near-Optimal Adaptive Compressed Sensing
Malloy, Matthew L., Nowak, Robert D.
This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be near-optimal in that it succeeds at the lowest possible signal-to-noise-ratio (SNR) levels, improving on previous work in adaptive compressed sensing. Like traditional compressed sensing based on random non-adaptive design matrices, the CASS algorithm requires only k log n measurements to recover a k-sparse signal of dimension n. However, CASS succeeds at SNR levels that are a factor log n less than required by standard compressed sensing. From the point of view of constructing and implementing the sensing operation as well as computing the reconstruction, the proposed algorithm is substantially less computationally intensive than standard compressed sensing. CASS is also demonstrated to perform considerably better in practice through simulation. To the best of our knowledge, this is the first demonstration of an adaptive compressed sensing algorithm with near-optimal theoretical guarantees and excellent practical performance. This paper also shows that methods like compressed sensing, group testing, and pooling have an advantage beyond simply reducing the number of measurements or tests -- adaptive versions of such methods can also improve detection and estimation performance when compared to non-adaptive direct (uncompressed) sensing.
Model Based Clustering of High-Dimensional Binary Data
Tang, Yang, Browne, Ryan P., McNicholas, Paul D.
We propose a mixture of latent trait models with common slope parameters (MCLT) for model-based clustering of high-dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a $d$-dimensional Gaussian latent variable, is extended by incorporating common factor analyzers. Accordingly, our approach facilitates a low-dimensional visual representation of the clusters. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through block-specific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Our approach is demonstrated on real and simulated data.