Goto

Collaborating Authors

 Government


Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data

arXiv.org Machine Learning

We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.


When Are Tree Structures Necessary for Deep Learning of Representations?

arXiv.org Artificial Intelligence

Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\bf recursive} neural models against sequential {\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\em component-whole} between nouns). Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models.


A model selection approach for clustering a multinomial sequence with non-negative factorization

arXiv.org Machine Learning

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC criteria but also extends the conventional criteria in a way that it can be applicable also to a sequence of sparse multinomial observations, where even within a same cluster, the number of multinomial trials may be different for different observations. In addition, as a preliminary estimation step to maximum likelihood estimation, and more generally, to maximum $L_{q}$ estimation, we propose to use reduced rank projection in combination with non-negative factorization. We motivate our approach by showing that our model selection criterion and preliminary estimation step yield consistent estimates under simplifying assumptions. We also illustrate our approach through numerical experiments using real and simulated data.


Universal Approximation of Edge Density in Large Graphs

arXiv.org Machine Learning

With the recent availability of much network data, such as world wide web, social networks, phone call networks, science collaboration graphs [1], [2], there is a renewed interest for the graph partitioning problem, especially for the automatic discovery of community structures in large networks [3], [4], [5]. Beyond clustering approaches, coclustering approaches aim at summarizing the relation between two entities in a many-to-many relationship. Such a relation can be represented as a graph, where the source and target vertices represent entities and the edges stand for relations between entities. A coclustering model provides a summary of a graph by grouping source vertices and target vertices. For example, in market analysis, the source vertices of the graph represent customers, the target vertices represent products and there is one edge each time a customer has purchased a product. A coclustering model summarizes the dataset by grouping customers that have purchased approximately the same products and grouping products that have been purchased by approximately the same customers. Coclustering models have been applied to many other domains, such as information retrieval (the entities are documents and their words in a text corpus), web log analysis (cookies and their visited web pages), web structure analysis (web pages with hyperlinks between them) or telecommunication network (the call detail records stand for the edges in a call graph between a caller and a called party). All these real-world graphs are directed multigraphs, meaning that two entities may be linked by multi-edges. We aim to summarize and discover insightful patterns in such graphs, using a method with the desired following properties: 1) Robustness, to avoid detecting spurious patterns in case of noisy data.


Appropriate Causal Models and the Stability of Causation

arXiv.org Artificial Intelligence

Causal models defined in terms of structural equations have proved to be quite a powerful way of representing knowledge regarding causality. However, a number of authors have given examples that seem to show that the Halpern-Pearl (HP) definition of causality gives intuitively unreasonable answers. Here it is shown that, for each of these examples, we can give two stories consistent with the description in the example, such that intuitions regarding causality are quite different for each story. By adding additional variables, we can disambiguate the stories. Moreover, in the resulting causal models, the HP definition of causality gives the intuitively correct answer. It is also shown that, by adding extra variables, a modification to the original HP definition made to deal with an example of Hopkins and Pearl may not be necessary. Given how much can be done by adding extra variables, there might be a concern that the notion of causality is somewhat unstable. Can adding extra variables in a "conservative" way (i.e., maintaining all the relations between the variables in the original model) cause the answer to the question "Is X=x a cause of Y=y" to alternate between "yes" and "no"? It is shown that we can have such alternation infinitely often, but if we take normality into consideration, we cannot. Indeed, under appropriate normality assumptions. adding an extra variable can change the answer from "yes" to "no", but after that, it cannot cannot change back to "yes".


A comparison of models for predicting early hospital readmissions

#artificialintelligence

We compare a variety of models for predicting early hospital readmissions. Performance of existing models is insufficient for practical applications. Random forests and deep neural networks perform best in terms of AUC. Models fit to homogeneous patient subgroups typically outperform global models. Risk sharing arrangements between hospitals and payers together with penalties imposed by the Centers for Medicare and Medicaid (CMS) are driving an interest in decreasing early readmissions.


Exploiting Binary Floating-Point Representations for Constraint Propagation: The Complete Unabridged Version

arXiv.org Artificial Intelligence

Floating-point computations are quickly finding their way in the design of safety- and mission-critical systems, despite the fact that designing floating-point algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floating-point computations is a hot research topic. An important verification technique, especially in some industrial sectors, is testing. However, generating test data for floating-point intensive programs proved to be a challenging problem. Existing approaches usually resort to random or search-based test data generation, but without symbolic reasoning it is almost impossible to generate test inputs that execute complex paths controlled by floating-point computations. Moreover, as constraint solvers over the reals or the rationals do not natively support the handling of rounding errors, the need arises for efficient constraint solvers over floating-point domains. In this paper, we present and fully justify improved algorithms for the propagation of arithmetic IEEE 754 binary floating-point constraints. The key point of these algorithms is a generalization of an idea by B. Marre and C. Michel that exploits a property of the representation of floating-point numbers.


Bypassing Combinatorial Protections: Polynomial-Time Algorithms for Single-Peaked Electorates

Journal of Artificial Intelligence Research

For many election systems, bribery (and related) attacks have been shown NP-hard using constructions on combinatorially rich structures such as partitions and covers. This paper shows that for voters who follow the most central political-science model of electorates---single-peaked preferences---those hardness protections vanish. By using single-peaked preferences to simplify combinatorial covering challenges, we for the first time show that NP-hard bribery problems---including those for Kemeny and Llull elections---fall to polynomial time for single-peaked electorates. By using single-peaked preferences to simplify combinatorial partition challenges, we for the first time show that NP-hard partition-of-voters problems fall to polynomial time for single-peaked electorates. We show that for single-peaked electorates, the winner problems for Dodgson and Kemeny elections, though Theta-two-complete in the general case, fall to polynomial time. And we completely classify the complexity of weighted coalition manipulation for scoring protocols in single-peaked electorates.


Platys: From Position to Place-Oriented Mobile Computing

AI Magazine

However, what often matters for experience is the user's place A semantic model of user-centered places, the Platys ontology enables the mapping Research in context-aware computing (Schilit, Adams, of positions to places. In the model, places and and Want 1994) aims to enable computing systems that activities can be represented at different levels of acquire and maintain context data and use it to adapt granularity using subsumption hierarchies. It originated with Weiser's vision of to determine a user's place at any given time. Place ubiquitous computing (Weiser 1999) where human recognition has been addressed with standard activities are enhanced with devices that are all around machine-learning classifiers as well as a semisupervised but unnoticeable to the user and that provide services expectation-maximization algorithm. The that adapt to the circumstances in which they are used. Location is an 1994; Schilit et al. 1993) are early works in contextaware essential part of place and therefore place recognition computing and dealt with tracking a user's location relies on location sensing. Since frequent location and using it to provide better services or sharing it sensing by a mobile device depletes power, we have with others.


Architectures for Activity Recognition and Context-Aware Computing

AI Magazine

The last 10 years have seen the development of novel architectures and technologies for domainfocused, task-specific systems that know many things, such as who (identities, profile, history) they are with (social context) and in what role (responsibility, security, privacy); when and where (event, time, place); why (goals, shared or personal); how are they doing it (methods, applications); and using what resources (device, services, access, and ownership). Smart spaces and devices will increasingly use such contextual knowledge to help users move seamlessly between devices and applications, without having to explicitly carry, transfer, and exchange activity context. Such systems will qualitatively shift our lives both at work and play and significantly change our interactions both with our physical and virtual worlds. This dream of seamlessly interacting with our virtual environment has a long history as can be seen in Apple Inc.'s Knowledge Navigator 1987 concept video. However, the combination of dramatic progress in low-power mobile computing devices and sensors, with advances in artificial intelligence and human-computer interaction (HCI) in the last decade, have provided the kind of platforms and algorithms that are enabling context-aware virtual personal assistants that plan activities and recognize intent. This has lead to an increase in work designed to bring these ideas into real world application and address the final technical hurdles that will make such systems a reality.