Government
Dynamic Rank Factor Model for Text Streams
Han, Shaobo, Du, Lin, Salazar, Esther, Carin, Lawrence
We propose a semi-parametric and dynamic rank factor model for topic modeling, capable of (1) discovering topic prevalence over time, and (2) learning contemporary multi-scale dependence structures, providing topic and word correlations as a byproduct. The high-dimensional and time-evolving ordinal/rank observations (such as word counts), after an arbitrary monotone transformation, are well accommodated through an underlying dynamic sparse factor model. The framework naturally admits heavy-tailed innovations, capable of inferring abrupt temporal jumps in the importance of topics. Posterior inference is performed through straightforward Gibbs sampling, based on the forward-filtering backward-sampling algorithm. Moreover, an efficient data subsampling scheme is leveraged to speed up inference on massive datasets. The modeling framework is illustrated on two real datasets: the US State of the Union Address and the JSTOR collection from Science.
Feature Cross-Substitution in Adversarial Classification
The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.
Dependent nonparametric trees for dynamic hierarchical clustering
Dubey, Kumar Avinava, Ho, Qirong, Williamson, Sinead A., Xing, Eric P.
Hierarchical clustering methods offer an intuitive and powerful way to model a wide variety of data sets. However, the assumption of a fixed hierarchy is often overly restrictive when working with data generated over a period of time: We expect both the structure of our hierarchy, and the parameters of the clusters, to evolve with time. In this paper, we present a distribution over collections of time-dependent, infinite-dimensional trees that can be used to model evolving hierarchies, and present an efficient and scalable algorithm for performing approximate inference in such a model. We demonstrate the efficacy of our model and inference algorithm on both synthetic data and real-world document corpora.
Distributed Estimation, Information Loss and Exponential Families
Liu, Qiang, Ihler, Alexander T.
Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly. We study the statistical properties of this framework, showing that the loss of efficiency compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the full-exponential-family-ness" represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of the KL and linear combination methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions."
Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection
Oh, Sang, Dalal, Onkar, Khare, Kshitij, Rajaratnam, Bala
Sparse high dimensional graphical model selection is a popular topic in contemporary machine learning. To this end, various useful approaches have been proposed in the context of $\ell_1$ penalized estimation in the Gaussian framework. Though many of these approaches are demonstrably scalable and have leveraged recent advances in convex optimization, they still depend on the Gaussian functional form. To address this gap, a convex pseudo-likelihood based partial correlation graph estimation method (CONCORD) has been recently proposed. This method uses cyclic coordinate-wise minimization of a regression based pseudo-likelihood, and has been shown to have robust model selection properties in comparison with the Gaussian approach. In direct contrast to the parallel work in the Gaussian setting however, this new convex pseudo-likelihood framework has not leveraged the extensive array of methods that have been proposed in the machine learning literature for convex optimization. In this paper, we address this crucial gap by proposing two proximal gradient methods (CONCORD-ISTA and CONCORD-FISTA) for performing $\ell_1$-regularized inverse covariance matrix estimation in the pseudo-likelihood framework. We present timing comparisons with coordinate-wise minimization and demonstrate that our approach yields tremendous pay offs for $\ell_1$-penalized partial correlation graph estimation outside the Gaussian setting, thus yielding the fastest and most scalable approach for such problems. We undertake a theoretical analysis of our approach and rigorously demonstrate convergence, and also derive rates thereof.
Just-In-Time Learning for Fast and Flexible Inference
Eslami, S. M. Ali, Tarlow, Daniel, Kohli, Pushmeet, Winn, John
Much of research in machine learning has centered around the search for inference algorithms that are both general-purpose and efficient. The problem is extremely challenging and general inference remains computationally expensive. We seek to address this problem by observing that in most specific applications of a model, we typically only need to perform a small subset of all possible inference computations. Motivated by this, we introduce just-in-time learning, a framework for fast and flexible inference that learns to speed up inference at run-time. Through a series of experiments, we show how this framework can allow us to combine the flexibility of sampling with the efficiency of deterministic message-passing.
Are We Ready for Driver-less Vehicles? Security vs. Privacy- A Social Perspective
At this moment Autonomous cars are probably the biggest and most talked about technology in the Robotics Research Community. In spite of great technological advances over past few years a full edged autonomous car is still far from reality. This article talks about the existing system and discusses the possibility of a Computer Vision enabled driving being superior than the LiDar based system. A detailed overview of privacy violations that might arise from autonomous driving has been discussed in detail both from a technical as well as legal perspective. It has been proved through evidence and arguments that efficient and accurate estimation and efficient solution of the constraint satisfaction problem addressed in the case of autonomous cars are negatively correlated with the preserving the privacy of the user. It is a very difficult trade-off since both are very important aspects and has to be taken into account. The fact that one cannot compromise with the safety issues of the car makes it inevitable to run into serious privacy concerns that might have adverse social and political effects.
The ROMES method for statistical modeling of reduced-order-model error
Drohmann, Martin, Carlberg, Kevin
This work presents a technique for statistically modeling errors introduced by reduced-order models. The method employs Gaussian-process regression to construct a mapping from a small number of computationally inexpensive `error indicators' to a distribution over the true error. The variance of this distribution can be interpreted as the (epistemic) uncertainty introduced by the reduced-order model. To model normed errors, the method employs existing rigorous error bounds and residual norms as indicators; numerical experiments show that the method leads to a near-optimal expected effectivity in contrast to typical error bounds. To model errors in general outputs, the method uses dual-weighted residuals---which are amenable to uncertainty control---as indicators. Experiments illustrate that correcting the reduced-order-model output with this surrogate can improve prediction accuracy by an order of magnitude; this contrasts with existing `multifidelity correction' approaches, which often fail for reduced-order models and suffer from the curse of dimensionality. The proposed error surrogates also lead to a notion of `probabilistic rigor', i.e., the surrogate bounds the error with specified probability.
Circumventing the Curse of Dimensionality in Prediction: Causal Rate-Distortion for Infinite-Order Markov Processes
Marzen, Sarah, Crutchfield, James P.
Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments show that algorithms which cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting causal rate-distortion theory substantially improves current predictive rate-distortion analyses.
Forecasting the Colorado River Discharge Using an Artificial Neural Network (ANN) Approach
Mehrkesh, Amirhossein, Ahmadi, Maryam
Artificial Neural Network (ANN) based model is a computational approach commonly used for modeling the complex relationships between input and output parameters. Prediction of the flow rate of a river is a requisite for any successful water resource management and river basin planning. In the current survey, the effectiveness of an Artificial Neural Network was examined to predict the Colorado River discharge. In this modeling process, an ANN model was used to relate the discharge of the Colorado River to such parameters as the amount of precipitation, ambient temperature and snowpack level at a specific time of the year. The model was able to precisely study the impact of climatic parameters on the flow rate of the Colorado River. Keywords: Artificial Neural Network, Discharge, Colorado River, River basin planning 1. Introduction The volumetric flow rate of a river, also called its discharge, at a particular point, is the volume of water passing through the cross section of the river at that point in a unit of time. As aforementioned, forecasting the flow rate of a river could be very useful in water resources management. Any seasonal river basin planning for designation of water between different consumers can not succeed without knowing/predicting the amount of water (i.e.