Goto

Collaborating Authors

 Asia


Predicting relevant empty spots in social interaction

arXiv.org Artificial Intelligence

Received: August 15, 2007 c 2006 Springer Science Business Media, Inc. Abstract An empty spot refers to an empty hard-to-fill space which can be found in the records of the social interaction, and is the clue to the persons in the underlying social network who do not appear in the records. This contribution addresses a problem to predict relevant empty spots in social interaction. Homogeneous and inhomogeneous networks are studied as a model underlying the social interaction. A heuristic predictor function method is presented as a new method to address the problem. Simulation experiment is demonstrated over a homogeneous network. A test data set in the form of market baskets is generated from the simulated communication. Precision to predict the empty spots is calculated to demonstrate the performance of the presented method.


On the Expressiveness of Levesque's Normal Form

Journal of Artificial Intelligence Research

Levesque proposed a generalization of a database called a proper knowledge base (KB), which is equivalent to a possibly infinite consistent set of ground literals. In contrast to databases, proper KBs do not make the closed-world assumption and hence the entailment problem becomes undecidable. Levesque then proposed a limited but efficient inference method V for proper KBs, which is sound and, when the query is in a certain normal form, also logically complete. He conjectured that for every first-order query there is an equivalent one in normal form. In this note, we show that this conjecture is false. In fact, we show that any class of formulas for which V is complete must be strictly less expressive than full first-order logic. Moreover, in the propositional case it is very unlikely that a formula always has a polynomial-size normal form.


Conjunctive Query Answering for the Description Logic SHIQ

Journal of Artificial Intelligence Research

Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base. We show decidability of query answering in this setting and establish two tight complexity bounds: regarding combined complexity, we prove that there is a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query, which is optimal. Regarding data complexity, we prove containment in co-NP.


On the Scaling Window of Model RB

arXiv.org Artificial Intelligence

This paper analyzes the scaling window of a random CSP model (i.e. model RB) for which we can identify the threshold points exactly, denoted by $r_{cr}$ or $p_{cr}$. For this model, we establish the scaling window $W(n,\delta)=(r_{-}(n,\delta), r_{+}(n,\delta))$ such that the probability of a random instance being satisfiable is greater than $1-\delta$ for $rr_{+}(n,\delta)$. Specifically, we obtain the following result $$W(n,\delta)=(r_{cr}-\Theta(\frac{1}{n^{1-\epsilon}\ln n}), \ r_{cr}+\Theta(\frac{1}{n\ln n})),$$ where $0\leq\epsilon<1$ is a constant. A similar result with respect to the other parameter $p$ is also obtained. Since the instances generated by model RB have been shown to be hard at the threshold, this is the first attempt, as far as we know, to analyze the scaling window of such a model with hard instances.


From k-SAT to k-CSP: Two Generalized Algorithms

arXiv.org Artificial Intelligence

Partially supported by the National 973 Program of China (Grant No. 2005CB321901). Preprint submitted to Elsevier 29 January 2018 with bounded clause length k (k-SAT) can be classified into three styles: DPLL-like, PPSZ-like and Local Search [2], with local search algorithms having already been generalized to CSP with bounded constraint arity k (k-CSP) [5]. We generalize a DPLL-like algorithm in its simplest form and a PPSZlike algorithm [4] from k-SAT to k-CSP. As far as we know, this is the first attempt to use PPSZ-like strategy to solve k-CSP, and before little work has been focused on the DPLL-like or PPSZ-like strategies for k-CSP.


MLLE: Modified Locally Linear Embedding Using Multiple Weights

Neural Information Processing Systems

The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE.


Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Neural Information Processing Systems

Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a nonparametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what "factors" are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real-world data sets of count data involving both vehicles and people are used to illustrate the technique.


Differential Entropic Clustering of Multivariate Gaussians

Neural Information Processing Systems

Gaussian data is pervasive and many learning algorithms (e.g., k-means) model their inputs as a single sample drawn from a multivariate Gaussian. However, in many real-life settings, each input object is best described by multiple samples drawn from a multivariate Gaussian. Such data can arise, for example, in a movie review database where each movie is rated by several users, or in time-series domains such as sensor networks. Here, each input can be naturally described by both a mean vector and covariance matrix which parameterize the Gaussian distribution. In this paper, we consider the problem of clustering such input objects, each represented as a multivariate Gaussian. We formulate the problem using an information theoretic approach and draw several interesting theoretical connections to Bregman divergences and also Bregman matrix divergences. We evaluate our method across several domains, including synthetic data, sensor network data, and a statistical debugging application.


A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments

Neural Information Processing Systems

The additive clustering model is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function of common features. This paper develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides a straightforward way to obtain estimates of both the number of features used in producing similarity judgments and their importance.


Learnability and the doubling dimension

Neural Information Processing Systems

We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor.