Europe
Detecting Danger: The Dendritic Cell Algorithm
Greensmith, Julie, Aickelin, Uwe, Cayzer, Steve
The Dendritic Cell Algorithm (DCA) is inspired by the function of the dendritic cells of the human immune system. In nature, dendritic cells are the intrusion detection agents of the human body, policing the tissue and organs for potential invaders in the form of pathogens. In this research, and abstract model of DC behaviour is developed and subsequently used to form an algorithm, the DCA. The abstraction process was facilitated through close collaboration with laboratory- based immunologists, who performed bespoke experiments, the results of which are used as an integral part of this algorithm. The DCA is a population based algorithm, with each agent in the system represented as an 'artificial DC'. Each DC has the ability to combine multiple data streams and can add context to data suspected as anomalous. In this chapter the abstraction process and details of the resultant algorithm are given. The algorithm is applied to numerous intrusion detection problems in computer security including the detection of port scans and botnets, where it has produced impressive results with relatively low rates of false positives.
Artificial Immune Systems (2010)
Greensmith, Julie, Whitbrook, Amanda, Aickelin, Uwe
The human immune system has numerous properties that make it ripe for exploitation in the computational domain, such as robustness and fault tolerance, and many different algorithms, collectively termed Artificial Immune Systems (AIS), have been inspired by it. Two generations of AIS are currently in use, with the first generation relying on simplified immune models and the second generation utilising interdisciplinary collaboration to develop a deeper understanding of the immune system and hence produce more complex models. Both generations of algorithms have been successfully applied to a variety of problems, including anomaly detection, pattern recognition, optimisation and robotics. In this chapter an overview of AIS is presented, its evolution is discussed, and it is shown that the diversification of the field is linked to the diversity of the immune system itself, leading to a number of algorithms as opposed to one archetypal system. Two case studies are also presented to help provide insight into the mechanisms of AIS; these are the idiotypic network approach and the Dendritic Cell Algorithm.
Heavy-Tailed Processes for Selective Shrinkage
Wauthier, Fabian L., Jordan, Michael I.
Heavy-tailed distributions are frequently used to enhance the robustness of regression and classification methods to outliers in output space. Often, however, we are confronted with "outliers" in input space, which are isolated observations in sparsely populated regions. We show that heavy-tailed stochastic processes (which we construct from Gaussian processes via a copula), can be used to improve robustness of regression and classification estimators to such outliers by selectively shrinking them more strongly in sparse regions than in dense regions. We carry out a theoretical analysis to show that selective shrinkage occurs, provided the marginals of the heavy-tailed process have sufficiently heavy tails. The analysis is complemented by experiments on biological data which indicate significant improvements of estimates in sparse regions while producing competitive results in dense regions.
SPOT: An R Package For Automatic and Interactive Tuning of Optimization Algorithms by Sequential Parameter Optimization
The sequential parameter optimization (SPOT) package for R is a toolbox for tuning and understanding simulation and optimization algorithms. Model-based investigations are common approaches in simulation and optimization. Sequential parameter optimization has been developed, because there is a strong need for sound statistical analysis of simulation and optimization algorithms. SPOT includes methods for tuning based on classical regression and analysis of variance techniques; tree-based models such as CART and random forest; Gaussian process models (Kriging), and combinations of different meta-modeling approaches. This article exemplifies how SPOT can be used for automatic and interactive tuning.
Understanding Semantic Web and Ontologies: Theory and Applications
One of the most interesting inventions, in recent decades, is that of Web Services [36]. These are computer program "applications": self-describing, selfcontained applications whose function is to automatically share information over the Internet with other applications. Some weaknesses such as browsing information without taking its meaning into account have recently appeared in Web Services. This creates a need for a new Web with more relevance to the user. Semantic Web is actually an extension of the current one in that it represents information more meaningfully for humans and computers alike. It enables the description of contents and services in machine-readable form, and enables annotating, discovering, publishing, advertising and composing services to be automated. It was developed based on Ontology, which is considered as the backbone of the Semantic Web. In other words, the current Web is transformed from being machine-readable to machineunderstandable. One function of the Web is to build a source of reference for information on several subjects, while the Semantic Web is designed to build a web of meaning.
The State of the Art: Ontology Web-Based Languages: XML Based
Many formal languages have been proposed to express or represent Ontologies, including RDF, RDFS, DAML OIL and OWL. Most of these languages are based on XML syntax, but with various terminologies and expressiveness. Therefore, choosing a language for building an Ontology is the main step. The main point of choosing language to represent Ontology is based mainly on what the Ontology will represent or be used for. That language should have a range of quality support features such as ease of use, expressive power, compatibility, sharing and versioning, internationalisation. This is because different kinds of knowledge-based applications need different language features. The main objective of these languages is to add semantics to the existing information on the web. The aims of this paper is to provide a good knowledge of existing language and understanding of these languages and how could be used.
sTeX+ - a System for Flexible Formalization of Linked Data
Kohlhase, Andrea, Kohlhase, Michael, Lange, Christoph
We present the sTeX+ system, a user-driven advancement of sTeX - a semantic extension of LaTeX that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc documents for the Web or further processing. Originally sTeX had been created as an invasive, semantic frontend for authoring XML documents. Here, we used sTeX in a Software Engineering case study as a formalization tool. In order to deal with modular pre-semantic vocabularies and relations, we upgraded it to sTeX+ in a participatory design process. We present a tool chain that starts with an sTeX+ editor and ultimately serves the generated documents as XHTML+RDFa Linked Data via an OMDoc-enabled, versioned XML database. In the final output, all structural annotations are preserved in order to enable semantic information retrieval services.
Towards the Development of a Simulator for Investigating the Impact of People Management Practices on Retail Performance
Siebers, Peer-Olaf, Aickelin, Uwe, Celia, Helen, Clegg, Chris
Often models for understanding the impact of management practices on retail performance are developed under the assumption of stability, equilibrium and linearity, whereas retail operations are considered in reality to be dynamic, non-linear and complex. Alternatively, discrete event and agent-based modelling are approaches that allow the development of simulation models of heterogeneous non-equilibrium systems for testing out different scenarios. When developing simulation models one has to abstract and simplify from the real world, which means that one has to try and capture the 'essence' of the system required for developing a representation of the mechanisms that drive the progression in the real system. Simulation models can be developed at different levels of abstraction. To know the appropriate level of abstraction for a specific application is often more of an art than a science. We have developed a retail branch simulation model to investigate which level of model accuracy is required for such a model to obtain meaningful results for practitioners.
Graph-Valued Regression
Liu, Han, Chen, Xi, Lafferty, John, Wasserman, Larry
Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression.'' In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART,'' or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.
Developing Approaches for Solving a Telecommunications Feature Subscription Problem
Lesaint, D., Mehta, D., O'Sullivan, B., Quesada, L., Wilson, N.
Call control features (e.g., call-divert, voice-mail) are primitive options to which users can subscribe off-line to personalise their service. The configuration of a feature subscription involves choosing and sequencing features from a catalogue and is subject to constraints that prevent undesirable feature interactions at run-time. When the subscription requested by a user is inconsistent, one problem is to find an optimal relaxation, which is a generalisation of the feedback vertex set problem on directed graphs, and thus it is an NP-hard task. We present several constraint programming formulations of the problem. We also present formulations using partial weighted maximum Boolean satisfiability and mixed integer linear programming. We study all these formulations by experimentally comparing them on a variety of randomly generated instances of the feature subscription problem.