Goto

Collaborating Authors

 Industry


Changepoint Detection over Graphs with the Spectral Scan Statistic

arXiv.org Machine Learning

In this article we are concerned with the basic but fundamental task of deciding whether a given graph, over which a noisy signal is observed, contains a cluster of anomalous or activated nodes comprising an induced connected subgraph. Such a problem is highly relevant in a variety of scientific areas, such as community detection in social networks, surveillance, disease outbreak detection, biomedical imaging, sensor network detection, gene network analysis, environmental monitoring and malware detection. Recent theoretical contributions in the statistical literature (see, e.g., Arias-Castro et al. [2005, 2008, 2011], Addario-Berry et al. [2010]) have detailed the inherent difficulty of such a testing problem in relatively simplified settings and under specific conditions on the graph topology. From a practical standpoint, the natural algorithm for detection of anomalous clusters of activity in graphs is the the generalized likelihood ratio test (GLRT) or scan statistic, a computationally intensive procedure that entails scanning all well connected clusters and testing individually for anomalous activation. Unfortunately, its performance over general graphs is not well understood, and little attention has been paid to determining alternative, computationally tractable, procedures.


The Causal Topography of Cognition

arXiv.org Artificial Intelligence

ABSTRACT: The causal structure of cognition can be simulated but not implemented computationally, just as the causal structure of a furnace can be simulated but not implemented computationally. It lacks the essential causal property of a furnace. This is obvious with computational furnaces. The only thing that allows us even to imagine that it is otherwise in the case of computational cognition is the fact that cognizing, unlike heating, is invisible (to everyone except the cognizer). Chalmers's "Dancing Qualia" Argument is hence invalid: Even if there could be a computational model of cognition that was behaviorally indistinguishable from a real, feeling cognizer, it would still be true that if, like heat, feeling is a dynamical property of the brain, a flip-flop from the presence to the absence of feeling would be undetectable anywhere along Chalmers's hypothetical component-swapping continuum from a human cognizer to a computational cognizer -- undetectable to everyone except the cognizer. But that would only be because the cognizer was locked into being incapable of doing anything to settle the matter simply because of Chalmers's premise of input/output indistinguishability. That is not a demonstration that cognition is computation; it is just the demonstation that you get out of a premise what you put into it.


Modeling Social Causality and Responsibility Judgment in Multi-Agent Interactions

Journal of Artificial Intelligence Research

Social causality is the inference an entity makes about the social behavior of other entities and self. Besides physical cause and effect, social causality involves reasoning about epistemic states of agents and coercive circumstances. Based on such inference, responsibility judgment is the process whereby one singles out individuals to assign responsibility, credit or blame for multi-agent activities. Social causality and responsibility judgment are a key aspect of social intelligence, and a model for them facilitates the design and development of a variety of multi-agent interactive systems. Based on psychological attribution theory, this paper presents a domain-independent computational model to automate social inference and judgment process according to an agents causal knowledge and observations of interaction. We conduct experimental studies to empirically validate the computational model. The experimental results show that our model predicts human judgments of social attributions and makes inferences consistent with what most people do in their judgments. Therefore, the proposed model can be generically incorporated into an intelligent system to augment its social and cognitive functionality.


Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

Journal of Artificial Intelligence Research

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.


Gradient Computation In Linear-Chain Conditional Random Fields Using The Entropy Message Passing Algorithm

arXiv.org Artificial Intelligence

The paper proposes a numerically stable recursive algorithm for the exact computation of the linear-chain conditional random field gradient. It operates as a forward algorithm over the log-domain expectation semiring and has the purpose of enhancing memory efficiency when applied to long observation sequences. Unlike the traditional algorithm based on the forward-backward recursions, the memory complexity of our algorithm does not depend on the sequence length. The experiments on real data show that it can be useful for the problems which deal with long sequences.


Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

arXiv.org Machine Learning

Over the past decades, statisticians and machine-learning researchers have developed literally thousands of new tools for the reduction of high-dimensional data in order to identify the variables most responsible for a particular trait. These tools have applications in a plethora of settings, including data analysis in the fields of business, education, forensics, and biology (such as microarray, proteomics, brain imaging), to name a few. In the present work, we focus our investigation on the limitations and potential misuses of certain tools in the analysis of the benchmark colon cancer data (2,000 variables; Alon et al., 1999) and the prostate cancer data (6,033 variables; Efron, 2010, 2008). Our analysis demonstrates that models that produce 100% accuracy measures often select different sets of genes and cannot stand the scrutiny of parameter estimates and model stability. Furthermore, we created a host of simulation datasets and "artificial diseases" to evaluate the reliability of commonly used statistical and data mining tools. We found that certain widely used models can classify the data with 100% accuracy without using any of the variables responsible for the disease. With moderate sample size and suitable pre-screening, stochastic gradient boosting will be shown to be a superior model for gene selection and variable screening from high-dimensional datasets.


Sparse Approximation via Penalty Decomposition Methods

arXiv.org Machine Learning

In this paper we consider sparse approximation problems, that is, general $l_0$ minimization problems with the $l_0$-"norm" of a vector being a part of constraints or objective function. In particular, we first study the first-order optimality conditions for these problems. We then propose penalty decomposition (PD) methods for solving them in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD methods satisfies the first-order optimality conditions of the problems. Furthermore, for the problems in which the $l_0$ part is the only nonconvex part, we show that such an accumulation point is a local minimizer of the problems. In addition, we show that any accumulation point of the sequence generated by the BCD method is a saddle point of the penalty subproblem. Moreover, for the problems in which the $l_0$ part is the only nonconvex part, we establish that such an accumulation point is a local minimizer of the penalty subproblem. Finally, we test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.


Regularization for Cox's proportional hazards model with NP-dimensionality

arXiv.org Machine Learning

High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the "irrepresentable condition" needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.


Stochastic Belief Propagation: A Low-Complexity Alternative to the Sum-Product Algorithm

arXiv.org Machine Learning

The sum-product or belief propagation (BP) algorithm is a widely-used message-passing algorithm for computing marginal distributions in graphical models with discrete variables. At the core of the BP message updates, when applied to a graphical model with pairwise interactions, lies a matrix-vector product with complexity that is quadratic in the state dimension $d$, and requires transmission of a $(d-1)$-dimensional vector of real numbers (messages) to its neighbors. Since various applications involve very large state dimensions, such computation and communication complexities can be prohibitively complex. In this paper, we propose a low-complexity variant of BP, referred to as stochastic belief propagation (SBP). As suggested by the name, it is an adaptively randomized version of the BP message updates in which each node passes randomly chosen information to each of its neighbors. The SBP message updates reduce the computational complexity (per iteration) from quadratic to linear in $d$, without assuming any particular structure of the potentials, and also reduce the communication complexity significantly, requiring only $\log{d}$ bits transmission per edge. Moreover, we establish a number of theoretical guarantees for the performance of SBP, showing that it converges almost surely to the BP fixed point for any tree-structured graph, and for graphs with cycles satisfying a contractivity condition. In addition, for these graphical models, we provide non-asymptotic upper bounds on the convergence rate, showing that the $\ell_{\infty}$ norm of the error vector decays no slower than $O(1/\sqrt{t})$ with the number of iterations $t$ on trees and the mean square error decays as $O(1/t)$ for general graphs. These analysis show that SBP can provably yield reductions in computational and communication complexities for various classes of graphical models.


Language-Constraint Reachability Learning in Probabilistic Graphs

arXiv.org Artificial Intelligence

Probabilistic graphs model uncertainty by means of probabilistic edges whose value quantifies the likelihood of the edge existence or the strength of the link it represents. One of the main issues in probabilistic graphs is how to compute the connectivity of the network. The network reliability problem [4] is a generalization of the pairwise reachability, in which the goal is to determine the probability that all pairs of nodes are reachable from one another. Unlike a deterministic graph in which the reachability function is a binary value function indicating whether or not there is a path connecting two nodes, in the case of probabilistic graphs the function assumes probabilistic values. The concept of reachability in probabilistic graphs is used, along with its specialization, as a tool to compute how two nodes in the graph are likely to be connected. Reachability plays an important role in wide range of applications, such as in peer-to-peer networks [3, 18], for probabilistic-routing problem [2, 10], in road network [11], and in trust analysis in social networks [22].As adopted in these works, reachability is quite similar to the general concept of link prediction [9], whose task may be formalized as follows. Given a networked structure (V,E) made up of a set of data instances V and set of observed links E among some nodes in V, the task corresponds to predict how likely should exist an unobserved link between two nodes in the network. The extension to probabilistic graphs adds an important ingredient that should be adequately exploited.