Genre
The Deterministic Dendritic Cell Algorithm
Greensmith, Julie, Aickelin, Uwe
The Dendritic Cell Algorithm is an immune-inspired algorithm orig- inally based on the function of natural dendritic cells. The original instantiation of the algorithm is a highly stochastic algorithm. While the performance of the algorithm is good when applied to large real-time datasets, it is difficult to anal- yse due to the number of random-based elements. In this paper a deterministic version of the algorithm is proposed, implemented and tested using a port scan dataset to provide a controllable system. This version consists of a controllable amount of parameters, which are experimented with in this paper. In addition the effects are examined of the use of time windows and variation on the number of cells, both which are shown to influence the algorithm. Finally a novel metric for the assessment of the algorithms output is introduced and proves to be a more sensitive metric than the metric used with the original Dendritic Cell Algorithm.
Segmentation and Nodal Points in Narrative: Study of Multiple Variations of a Ballad
The Lady Maisry ballads afford us a framework within which to segment a storyline into its major components. Segments and as a consequence nodal points are discussed for nine different variants of the Lady Maisry story of a (young) woman being burnt to death by her family, on account of her becoming pregnant by a foreign personage. We motivate the importance of nodal points in textual and literary analysis. We show too how the openings of the nine variants can be analyzed comparatively, and also the conclusions of the ballads.
Computing by Means of Physics-Based Optical Neural Networks
Younger, A. Steven, Redd, Emmett
We report recent research on computing with biology-based neural network models by means of physics-based opto-electronic hardware. New technology provides opportunities for very-high-speed computation and uncovers problems obstructing the wide-spread use of this new capability. The Computation Modeling community may be able to offer solutions to these cross-boundary research problems.
Begin, After, and Later: a Maximal Decidable Interval Temporal Logic
Bresolin, Davide, Sala, Pietro, Sciavicco, Guido
Interval temporal logics (ITLs) are logics for reasoning about temporal statements expressed over intervals, i.e., periods of time. The most famous ITL studied so far is Halpern and Shoham's HS, which is the logic of the thirteen Allen's interval relations. Unfortunately, HS and most of its fragments have an undecidable satisfiability problem. This discouraged the research in this area until recently, when a number non-trivial decidable ITLs have been discovered. This paper is a contribution towards the complete classification of all different fragments of HS. We consider different combinations of the interval relations Begins, After, Later and their inverses Abar, Bbar, and Lbar. We know from previous works that the combination ABBbarAbar is decidable only when finite domains are considered (and undecidable elsewhere), and that ABBbar is decidable over the natural numbers. We extend these results by showing that decidability of ABBar can be further extended to capture the language ABBbarLbar, which lays in between ABBar and ABBbarAbar, and that turns out to be maximal w.r.t decidability over strongly discrete linear orders (e.g. finite orders, the naturals, the integers). We also prove that the proposed decision procedure is optimal with respect to the complexity class.
Uncovering the Riffled Independence Structure of Rankings
Huang, Jonathan, Guestrin, Carlos
Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of $n$ objects scales factorially in $n$. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called \emph{riffled independence}, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the \emph{riffle shuffle}, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. Additionally, we propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.
General-Purpose Computing on a Semantic Network Substrate
A semantic network is a directed labeled graph (Sowa, 1991). The thesis of this article is that the state of a computing machine, its low-level instructions, and the executing program can be represented as a semantic network. The computational model that is presented can be instantiated using any semantic network representation. However, given the existence of the Resource Description Framework (RDF) (Manola & Miller, 2004) and the popular Web Ontology Language (OWL) (McGuinness & Harmelen, 2004), this article presents the theory and the application in terms of these constructs. The computing model that is proposed is perhaps simple in theory, but in application, requires a relatively strong background in the computer sciences.
The Dilated Triple
Rodriguez, Marko A., Pepe, Alberto, Shinavier, Joshua
The basic unit of meaning on the Semantic Web is the RDF statement, or triple, which combines a distinct subject, predicate and object to make a definite assertion about the world. A set of triples constitutes a graph, to which they give a collective meaning. It is upon this simple foundation that the rich, complex knowledge structures of the Semantic Web are built. Yet the very expressiveness of RDF, by inviting comparison with real-world knowledge, highlights a fundamental shortcoming, in that RDF is limited to statements of absolute fact, independent of the context in which a statement is asserted. This is in stark contrast with the thoroughly context-sensitive nature of human thought. The model presented here provides a particularly simple means of contextualizing an RDF triple by associating it with related statements in the same graph. This approach, in combination with a notion of graph similarity, is sufficient to select only those statements from an RDF graph which are subjectively most relevant to the context of the requesting process.
Tree-Structured Stick Breaking Processes for Hierarchical Data
Adams, Ryan Prescott, Ghahramani, Zoubin, Jordan, Michael I.
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data.
Chi-square-based scoring function for categorization of MEDLINE citations
Kastrin, Andrej, Peterlin, Borut, Hristovski, Dimitar
Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.
Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data
Kastrin, Andrej, Peterlin, Borut
Class prediction is an important application of microarray gene expression data analysis. The high-dimensionality of microarray data, where number of genes (variables) is very large compared to the number of samples (obser- vations), makes the application of many prediction techniques (e.g., logistic regression, discriminant analysis) difficult. An efficient way to solve this prob- lem is by using dimension reduction statistical techniques. Increasingly used in psychology-related applications, Rasch model (RM) provides an appealing framework for handling high-dimensional microarray data. In this paper, we study the potential of RM-based modeling in dimensionality reduction with binarized microarray gene expression data and investigate its prediction ac- curacy in the context of class prediction using linear discriminant analysis. Two different publicly available microarray data sets are used to illustrate a general framework of the approach. Performance of the proposed method is assessed by re-randomization scheme using principal component analysis (PCA) as a benchmark method. Our results show that RM-based dimension reduction is as effective as PCA-based dimension reduction. The method is general and can be applied to the other high-dimensional data problems.