Not enough data to create a plot.
Try a different view from the menu above.
Country
Missing Data using Decision Forest and Computational Intelligence
Autoencoder neural network is implemented to estimate the missing data. Genetic algorithm is implemented for network optimization and estimating the missing data. Missing data is treated as Missing At Random mechanism by implementing maximum likelihood algorithm. The network performance is determined by calculating the mean square error of the network prediction. The network is further optimized by implementing Decision Forest. The impact of missing data is then investigated and decision forrests are found to improve the results.
An analysis of a random algorithm for estimating all the matchings
Counting the number of all the matchings on a bipartite graph has been transformed into calculating the permanent of a matrix obtained from the extended bipartite graph by Yan Huo, and Rasmussen presents a simple approach (RM) to approximate the permanent, which just yields a critical ratio O($n\omega(n)$) for almost all the 0-1 matrices, provided it's a simple promising practical way to compute this #P-complete problem. In this paper, the performance of this method will be shown when it's applied to compute all the matchings based on that transformation. The critical ratio will be proved to be very large with a certain probability, owning an increasing factor larger than any polynomial of $n$ even in the sense for almost all the 0-1 matrices. Hence, RM fails to work well when counting all the matchings via computing the permanent of the matrix. In other words, we must carefully utilize the known methods of estimating the permanent to count all the matchings through that transformation.
Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift
Abi-Haidar, Alaa, Rocha, Luis M.
This paper proposes a novel solution to spam detection inspired by a model of the adaptive immune system known as the crossregulation model. We report on the testing of a preliminary algorithm on six e-mail corpora. We also compare our results statically and dynamically with those obtained by the Naive Bayes classifier and another binary classification method we developed previously for biomedical text-mining applications. We show that the cross-regulation model is competitive against those and thus promising as a bio-inspired algorithm for spam detection in particular, and binary classification in general.
Justifications for Logic Programs under Answer Set Semantics
Pontelli, Enrico, Son, Tran Cao, Elkhatib, Omar
The paper introduces the notion of off-line justification for Answer Set Programming (ASP). Justifications provide a graph-based explanation of the truth value of an atom w.r.t. a given answer set. The paper extends also this notion to provide justification of atoms during the computation of an answer set (on-line justification), and presents an integration of on-line justifications within the computation model of Smodels. Off-line and on-line justifications provide useful tools to enhance understanding of ASP, and they offer a basic data structure to support methodologies and tools for debugging answer set programs. A preliminary implementation has been developed in ASP-PROLOG. (To appear in Theory and Practice of Logic Programming (TPLP))
Probabilistic reasoning with answer sets
Baral, Chitta, Gelfond, Michael, Rushton, Nelson
This paper develops a declarative language, P-log, that combines logical and probabilistic arguments in its reasoning. Answer Set Prolog is used as the logical foundation, while causal Bayes nets serve as a probabilistic foundation. We give several non-trivial examples and illustrate the use of P-log for knowledge representation and updating of knowledge. We argue that our approach to updates is more appealing than existing approaches. We give sufficiency conditions for the coherency of P-log programs and show that Bayes nets can be easily mapped to coherent P-log programs.
The Future of Scientific Simulations: from Artificial Life to Artificial Cosmogenesis
This philosophical paper explores the relation between modern scientific simulations and the future of the universe. We argue that a simulation of an entire universe will result from future scientific activity. This requires us to tackle the challenge of simulating open-ended evolution at all levels in a single simulation. The simulation should encompass not only biological evolution, but also physical evolution (a level below) and cultural evolution (a level above). The simulation would allow us to probe what would happen if we would "replay the tape of the universe" with the same or different laws and initial conditions. We also distinguish between real-world and artificial-world modelling. Assuming that intelligent life could indeed simulate an entire universe, this leads to two tentative hypotheses. Some authors have argued that we may already be in a simulation run by an intelligent entity. Or, if such a simulation could be made real, this would lead to the production of a new universe. This last direction is argued with a careful speculative philosophical approach, emphasizing the imperative to find a solution to the heat death problem in cosmology. The reader is invited to consult Annex 1 for an overview of the logical structure of this paper. -- Keywords: far future, future of science, ALife, simulation, realization, cosmology, heat death, fine-tuning, physical eschatology, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis, selfish biocosm hypothesis, meduso-anthropic principle, developmental singularity hypothesis, role of intelligent life.
Kernel Regression by Mode Calculation of the Conditional Probability Distribution
The most direct way to express arbitrary dependencies in datasets is to estimate the joint distribution and to apply afterwards the argmax-function to obtain the mode of the corresponding conditional distribution. This method is in practice difficult, because it requires a global optimization of a complicated function, the joint distribution by fixed input variables. This article proposes a method for finding global maxima if the joint distribution is modeled by a kernel density estimation. Some experiments show advantages and shortcomings of the resulting regression method in comparison to the standard Nadaraya-Watson regression technique, which approximates the optimum by the expectation value.
Random Forests: some methodological insights
Genuer, Robin, Poggi, Jean-Michel, Tuleau, Christine
This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.
High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence
Ravikumar, Pradeep, Wainwright, Martin J., Raskutti, Garvesh, Yu, Bin
Given i.i.d. observations of a random vector $X \in \mathbb{R}^p$, we study the problem of estimating both its covariance matrix $\Sigma^*$, and its inverse covariance or concentration matrix {$\Theta^* = (\Sigma^*)^{-1}$.} We estimate $\Theta^*$ by minimizing an $\ell_1$-penalized log-determinant Bregman divergence; in the multivariate Gaussian case, this approach corresponds to $\ell_1$-penalized maximum likelihood, and the structure of $\Theta^*$ is specified by the graph of an associated Gaussian Markov random field. We analyze the performance of this estimator under high-dimensional scaling, in which the number of nodes in the graph $p$, the number of edges $s$ and the maximum node degree $d$, are allowed to grow as a function of the sample size $n$. In addition to the parameters $(p,s,d)$, our analysis identifies other key quantities covariance matrix $\Sigma^*$; and (b) the $\ell_\infty$ operator norm of the sub-matrix $\Gamma^*_{S S}$, where $S$ indexes the graph edges, and $\Gamma^* = (\Theta^*)^{-1} \otimes (\Theta^*)^{-1}$; and (c) a mutual incoherence or irrepresentability measure on the matrix $\Gamma^*$ and (d) the rate of decay $1/f(n,\delta)$ on the probabilities $ \{|\hat{\Sigma}^n_{ij}- \Sigma^*_{ij}| > \delta \}$, where $\hat{\Sigma}^n$ is the sample covariance based on $n$ samples. Our first result establishes consistency of our estimate $\hat{\Theta}$ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $d = o(\sqrt{s})$. In our second result, we show that with probability converging to one, the estimate $\hat{\Theta}$ correctly specifies the zero pattern of the concentration matrix $\Theta^*$.
Exact phase transition of backtrack-free search with implications on the power of greedy algorithms
Backtracking is a basic strategy to solve constraint satisfaction problems (CSPs). A satisfiable CSP instance is backtrack-free if a solution can be found without encountering any dead-end during a backtracking search, implying that the instance is easy to solve. We prove an exact phase transition of backtrack-free search in some random CSPs, namely in Model RB and in Model RD. This is the first time an exact phase transition of backtrack-free search can be identified on some random CSPs. Our technical results also have interesting implications on the power of greedy algorithms, on the width of random hypergraphs and on the exact satisfiability threshold of random CSPs.