Technology
Use of Rapid Probabilistic Argumentation for Ranking on Large Complex Networks
Abstract-- We introduce a family of novel ranking algorithms called ERank which run in linear/near linear time and build on explicitly modeling a network as uncertain evidence. The model uses Probabilistic Argumentation Systems (PAS) which are a combination of probability theory and propositional logic, and also a special case of Dempster-Shafer Theory of Evidence. ERank rapidly generates approximate results for the NPcomplete problem involved enabling the use of the technique in large networks. We use a previously introduced PAS model for citation networks generalizing it for all networks. We propose a statistical test to be used for comparing the performances of different ranking algorithms based on a clustering validity test. Our experimentation using this test on a real-world network shows ERank to have the best performance in comparison to well-known algorithms including PageRank, closeness, and betweenness. Ranking nodes in complex networks is an important challenge. Depending on the type of network and the application the meaning of a rank can be different. For the World Wide Web one is usually after popular and informative pages (e.g. For a citation network it is influential papers, for social networks (e.g. Facebook, LinkedIn) it is central/important persons. More recently, networks are tools for calculating trust and transitional trust [1]. Algorithms applied today to large networks often rely on an intuitive idea (e.g.
Self Organizing Map algorithm and distortion measure
We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an compact Euclidean space, we prove the strong consistency of the map which almost minimizes the empirical distortion. Moreover, after calculating the derivatives of the theoretical distortion measure, we show that the points minimizing this measure and the equilibria of the Kohonen map do not match in general. We illustrate, with a simple example, how this occurs.
Efficient Estimation of Multidimensional Regression Model with Multilayer Perceptron
This work concerns estimation of multidimensional nonlinear regression models using multilayer perceptron (MLP). The main problem with such model is that we have to know the covariance matrix of the noise to get optimal estimator. however we show that, if we choose as cost function the logarithm of the determinant of the empirical error covariance matrix, we get an asymptotically optimal estimator.
Testing the number of parameters with multidimensional MLP
This work concerns testing the number of parameters in one hidden layer multilayer perceptron (MLP). For this purpose we assume that we have identifiable models, up to a finite group of transformations on the weights, this is for example the case when the number of hidden units is know. In this framework, we show that we get a simple asymptotic distribution, if we use the logarithm of the determinant of the empirical error covariance matrix as cost function.
Classification Constrained Dimensionality Reduction
Raich, Raviv, Costa, Jose A., Damelin, Steven B., Hero, Alfred O. III
Dimensionality reduction is a topic of recent interest. In this paper, we present the classification constrained dimensionality reduction (CCDR) algorithm to account for label information. The algorithm can account for multiple classes as well as the semi-supervised setting. We present an out-of-sample expressions for both labeled and unlabeled data. For unlabeled data, we introduce a method of embedding a new point as preprocessing to a classifier. For labeled data, we introduce a method that improves the embedding during the training phase using the out-of-sample extension. We investigate classification performance using the CCDR algorithm on hyper-spectral satellite imagery data. We demonstrate the performance gain for both local and global classifiers and demonstrate a 10% improvement of the $k$-nearest neighbors algorithm performance. We present a connection between intrinsic dimension estimation and the optimal embedding dimension obtained using the CCDR algorithm.
Design and Implementation of Aggregate Functions in the DLV System
Faber, Wolfgang, Pfeifer, Gerald, Leone, Nicola, Dell'Armi, Tina, Ielpa, Giuseppe
Disjunctive Logic Programming (DLP) is a very expressive formalism: it allows for expressing every property of finite structures that is decidable in the complexity class SigmaP2 (= NP^NP). Despite this high expressiveness, there are some simple properties, often arising in real-world applications, which cannot be encoded in a simple and natural manner. Especially properties that require the use of arithmetic operators (like sum, times, or count) on a set or multiset of elements, which satisfy some conditions, cannot be naturally expressed in classic DLP. To overcome this deficiency, we extend DLP by aggregate functions in a conservative way. In particular, we avoid the introduction of constructs with disputed semantics, by requiring aggregates to be stratified. We formally define the semantics of the extended language (called DLP^A), and illustrate how it can be profitably used for representing knowledge. Furthermore, we analyze the computational complexity of DLP^A, showing that the addition of aggregates does not bring a higher cost in that respect. Finally, we provide an implementation of DLP^A in DLV -- a state-of-the-art DLP system -- and report on experiments which confirm the usefulness of the proposed extension also for the efficiency of computation.
Predicting relevant empty spots in social interaction
Maeno, Yoshiharu, Ohsawa, Yukio
Received: August 15, 2007 c 2006 Springer Science Business Media, Inc. Abstract An empty spot refers to an empty hard-to-fill space which can be found in the records of the social interaction, and is the clue to the persons in the underlying social network who do not appear in the records. This contribution addresses a problem to predict relevant empty spots in social interaction. Homogeneous and inhomogeneous networks are studied as a model underlying the social interaction. A heuristic predictor function method is presented as a new method to address the problem. Simulation experiment is demonstrated over a homogeneous network. A test data set in the form of market baskets is generated from the simulated communication. Precision to predict the empty spots is calculated to demonstrate the performance of the presented method.
On the Expressiveness of Levesque's Normal Form
Levesque proposed a generalization of a database called a proper knowledge base (KB), which is equivalent to a possibly infinite consistent set of ground literals. In contrast to databases, proper KBs do not make the closed-world assumption and hence the entailment problem becomes undecidable. Levesque then proposed a limited but efficient inference method V for proper KBs, which is sound and, when the query is in a certain normal form, also logically complete. He conjectured that for every first-order query there is an equivalent one in normal form. In this note, we show that this conjecture is false. In fact, we show that any class of formulas for which V is complete must be strictly less expressive than full first-order logic. Moreover, in the propositional case it is very unlikely that a formula always has a polynomial-size normal form.
Anisotropic selection in cellular genetic algorithms
Simoncini, David, Verel, Sébastien, Collard, Philippe, Clergue, Manuel
In this paper we introduce a new selection scheme in cellular genetic algorithms (cGAs). Anisotropic Selection (AS) promotes diversity and allows accurate control of the selective pressure. First we compare this new scheme with the classical rectangular grid shapes solution according to the selective pressure: we can obtain the same takeover time with the two techniques although the spreading of the best individual is different. We then give experimental results that show to what extent AS promotes the emergence of niches that support low coupling and high cohesion. Finally, using a cGA with anisotropic selection on a Quadratic Assignment Problem we show the existence of an anisotropic optimal value for which the best average performance is observed. Further work will focus on the selective pressure self-adjustment ability provided by this new selection scheme.
FINE: Fisher Information Non-parametric Embedding
Carter, Kevin M., Raich, Raviv, Finn, William G., Hero, Alfred O.
We consider the problems of clustering, classification, and visualization of high-dimensional data when no straightforward Euclidean representation exists. Typically, these tasks are performed by first reducing the high-dimensional data to some lower dimensional Euclidean space, as many manifold learning methods have been developed for this task. In many practical problems however, the assumption of a Euclidean manifold cannot be justified. In these cases, a more appropriate assumption would be that the data lies on a statistical manifold, or a manifold of probability density functions (PDFs). In this paper we propose using the properties of information geometry in order to define similarities between data sets using the Fisher information metric. We will show this metric can be approximated using entirely non-parametric methods, as the parameterization of the manifold is generally unknown. Furthermore, by using multi-dimensional scaling methods, we are able to embed the corresponding PDFs into a low-dimensional Euclidean space. This not only allows for classification of the data, but also visualization of the manifold. As a whole, we refer to our framework as Fisher Information Non-parametric Embedding (FINE), and illustrate its uses on a variety of practical problems, including bio-medical applications and document classification.