Goto

Collaborating Authors

 Law


Nested Hierarchical Dirichlet Processes

arXiv.org Machine Learning

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.


Supersparse Linear Integer Models for Interpretable Classification

arXiv.org Machine Learning

Scoring systems are classification models that only require users to add, subtract and multiply a few meaningful numbers to make a prediction. These models are often used because they are practical and interpretable. In this paper, we introduce an off-the-shelf tool to create scoring systems that both accurate and interpretable, known as a Supersparse Linear Integer Model (SLIM). SLIM is a discrete optimization problem that minimizes the 0-1 loss to encourage a high level of accuracy, regularizes the L0-norm to encourage a high level of sparsity, and constrains coefficients to a set of interpretable values. We illustrate the practical and interpretable nature of SLIM scoring systems through applications in medicine and criminology, and show that they are are accurate and sparse in comparison to state-of-the-art classification models using numerical experiments.



Discovering Latent Network Structure in Point Process Data

arXiv.org Machine Learning

Networks play a central role in modern data analysis, enabling us to reason about systems by studying the relationships between their parts. Most often in network analysis, the edges are given. However, in many systems it is difficult or impossible to measure the network directly. Examples of latent networks include economic interactions linking financial instruments and patterns of reciprocity in gang violence. In these cases, we are limited to noisy observations of events associated with each node. To enable analysis of these implicit networks, we develop a probabilistic model that combines mutually-exciting point processes with random graph models. We show how the Poisson superposition principle enables an elegant auxiliary variable formulation and a fully-Bayesian, parallel inference algorithm. We evaluate this new model empirically on several datasets.


Lexical and Hierarchical Topic Regression

Neural Information Processing Systems

Inspired by a two-level theory that unifies agenda setting and ideological framing, we propose supervised hierarchical latent Dirichlet allocation (SHLDA) which jointly captures documents' multi-level topic structure and their polar response variables. Our model extends the nested Chinese restaurant process to discover a tree-structured topic hierarchy and uses both per-topic hierarchical and per-word lexical regression parameters to model the response variables. Experiments in a political domain and on sentiment analysis tasks show that SHLDA improves predictive accuracy while adding a new dimension of insight into how topics under discussion are framed.


Missing Value Imputation With Unsupervised Backpropagation

arXiv.org Machine Learning

Unfortunately, real-world datasets often include only samples of observed values mixed with many missing or unknown elements. Missing values may occur due to human impatience, human error during data entry, data loss, faulty sensory equipment, changes in data collection methods, inability to decipher handwriting, privacy issues, legal requirements, and a variety of other practical factors. Thus, improvements to methods for imputing missing values can have far-reaching impact on improving the effectiveness of existing learning algorithms for operating on real-world data. We present a method for imputation called Unsupervised Backpropagation (UBP), which trains a multilayer perceptron (MLP) to fit to the manifold represented by the known features in a dataset. We demonstrate this algorithm with the task of imputing missing values, and we show that it is significantly more effective than other methods for imputation. Backpropagation has long been a popular method for training neural networks (Rumelhart et al., 1986; Werbos, 1990).


Object-oriented Bayesian networks for a decision support system for antitrust enforcement

arXiv.org Artificial Intelligence

We study an economic decision problem where the actors are two firms and the Antitrust Authority whose main task is to monitor and prevent firms' potential anti-competitive behaviour and its effect on the market. The Antitrust Authority's decision process is modelled using a Bayesian network where both the relational structure and the parameters of the model are estimated from a data set provided by the Authority itself. A number of economic variables that influence this decision process are also included in the model. We analyse how monitoring by the Antitrust Authority affects firms' strategies about cooperation. Firms' strategies are modelled as a repeated prisoner's dilemma using object-oriented Bayesian networks. We show how the integration of firms' decision process and external market information can be modelled in this way. Various decision scenarios and strategies are illustrated.


Bayesian Optimization With Censored Response Data

arXiv.org Artificial Intelligence

Bayesian optimization (BO) aims to minimize a given blackbox function using a model that is updated whenever new evidence about the function becomes available. Here, we address the problem of BO under partially right-censored response data, where in some evaluations we only obtain a lower bound on the function value. The ability to handle such response data allows us to adaptively censor costly function evaluations in minimization problems where the cost of a function evaluation corresponds to the function value. One important application giving rise to such censored data is the runtime-minimizing variant of the algorithm configuration problem: finding settings of a given parametric algorithm that minimize the runtime required for solving problem instances from a given distribution. We demonstrate that terminating slow algorithm runs prematurely and handling the resulting right-censored observations can substantially improve the state of the art in model-based algorithm configuration.


Learning Hidden Structures with Relational Models by Adequately Involving Rich Information in A Network

arXiv.org Machine Learning

Effectively modelling hidden structures in a network is very practical but theoretically challenging. Existing relational models only involve very limited information, namely the binary directional link data, embedded in a network to learn hidden networking structures. There is other rich and meaningful information (e.g., various attributes of entities and more granular information than binary elements such as "like" or "dislike") missed, which play a critical role in forming and understanding relations in a network. In this work, we propose an informative relational model (InfRM) framework to adequately involve rich information and its granularity in a network, including metadata information about each entity and various forms of link data. Firstly, an effective metadata information incorporation method is employed on the prior information from relational models MMSB and LFRM. This is to encourage the entities with similar metadata information to have similar hidden structures. Secondly, we propose various solutions to cater for alternative forms of link data. Substantial efforts have been made towards modelling appropriateness and efficiency, for example, using conjugate priors. We evaluate our framework and its inference algorithms in different datasets, which shows the generality and effectiveness of our models in capturing implicit structures in networks.


Compact Representations of Extended Causal Models

arXiv.org Artificial Intelligence

One of Judea Pearl's many, many important contributions to the study of causality was the first attempt to use the mathematical tools of causal modeling to give an account of "actual causation", a notion that has been of considerable interest among philosophers and legal theorists (Pearl, 2000, Chapter 10). Pearl later revised his account of actual causation in joint work with Halpern (Halpern & Pearl, 2005). A number of authors (Hall, 2007; Halpern, 2008; Hitchcock, 2007; Menzies, 2004) have suggested that an account of actual causation must be sensitive to considerations of normality, as well as to causal structure. In (Halpern & Hitchcock, 2011), we suggest a way of incorporating considerations of normality into the Halpern-Pearl theory, and show how to extend the account to illuminate features of the psychology of causal judgment, as well as features of causal reasoning in the law. Our account of actual causation makes use of "extended causal models", which include both structural equations among a set of variables, and a partial preorder on possible worlds, which represents the relative "normality" of those worlds. We actually want to think of people as working with the structural equations and normality order to evaluate actual causation. However, consideration of even simple examples immediately suggests a problem. A direct representation of the equations and normality order is too cumbersome for cognitively limited agents to use effectively. If our account of actual causation is to be at all realistic as a model of human causal judgment, some form of compact representation will be needed.