Goto

Collaborating Authors

 Country


Detecting Danger: Applying a Novel Immunological Concept to Intrusion Detection Systems

arXiv.org Artificial Intelligence

In recent years computer systems have become increasingly complex and consequently the challenge of protecting these systems has become increasingly difficult. Various techniques have been implemented to counteract the misuse of computer systems in the form of firewalls, anti-virus software and intrusion detection systems. The complexity of networks and dynamic nature of computer systems leaves current methods with significant room for improvement. Computer scientists have recently drawn inspiration from mechanisms found in biological systems and, in the context of computer security, have focused on the human immune system (HIS). The human immune system provides a high level of protection from constant attacks. By examining the precise mechanisms of the human immune system, it is hoped the paradigm will improve the performance of real intrusion detection systems. This paper presents an introduction to recent developments in the field of immunology. It discusses the incorporation of a novel immunological paradigm, Danger Theory, and how this concept is inspiring artificial immune systems (AIS). Applications within the context of computer security are outlined drawing direct reference to the underlying principles of Danger Theory and finally, the current state of intrusion detection systems is discussed and improvements suggested.


Feature Level Fusion of Biometrics Cues: Human Identification with Doddingtons Caricature

arXiv.org Artificial Intelligence

This paper presents a multimodal biometric system of fingerprint and ear biometrics. Scale Invariant Feature Transform (SIFT) descriptor based feature sets extracted from fingerprint and ear are fused. The fused set is encoded by K-medoids partitioning approach with less number of feature points in the set. K-medoids partition the whole dataset into clusters to minimize the error between data points belonging to the clusters and its center. Reduced feature set is used to match between two biometric sets. Matching scores are generated using wolf-lamb user-dependent feature weighting scheme introduced by Doddington. The technique is tested to exhibit its robust performance.


Face Identification by SIFT-based Complete Graph Topology

arXiv.org Artificial Intelligence

This paper presents a new face identification system based on Graph Matching Technique on SIFT features extracted from face images. Although SIFT features have been successfully used for general object detection and recognition, only recently they were applied to face recognition. This paper further investigates the performance of identification techniques based on Graph matching topology drawn on SIFT features which are invariant to rotation, scaling and translation. Face projections on images, represented by a graph, can be matched onto new images by maximizing a similarity function taking into account spatial distortions and the similarities of the local features. Two graph based matching techniques have been investigated to deal with false pair assignment and reducing the number of features to find the optimal feature set between database and query face SIFT features. The experimental results, performed on the BANCA database, demonstrate the effectiveness of the proposed system for automatic face identification.


Detecting Motifs in System Call Sequences

arXiv.org Artificial Intelligence

The search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed, and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system's user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A higher level call system language for measuring similarity between patterns of such calls is also suggested.


Face Recognition by Fusion of Local and Global Matching Scores using DS Theory: An Evaluation with Uni-classifier and Multi-classifier Paradigm

arXiv.org Artificial Intelligence

Faces are highly deformable objects which may easily change their appearance over time. Not all face areas are subject to the same variability. Therefore decoupling the information from independent areas of the face is of paramount importance to improve the robustness of any face recognition technique. This paper presents a robust face recognition technique based on the extraction and matching of SIFT features related to independent face areas. Both a global and local (as recognition from parts) matching strategy is proposed. The local strategy is based on matching individual salient facial SIFT features as connected to facial landmarks such as the eyes and the mouth. As for the global matching strategy, all SIFT features are combined together to form a single feature. In order to reduce the identification errors, the Dempster-Shafer decision theory is applied to fuse the two matching techniques. The proposed algorithms are evaluated with the ORL and the IITK face databases. The experimental results demonstrate the effectiveness and potential of the proposed face recognition techniques also in the case of partially occluded faces or with missing information.


Confidence Sets Based on Penalized Maximum Likelihood Estimators in Gaussian Regression

arXiv.org Machine Learning

Confidence intervals based on penalized maximum likelihood estimators such as the LASSO, adaptive LASSO, and hard-thresholding are analyzed. In the known-variance case, the finite-sample coverage properties of such intervals are determined and it is shown that symmetric intervals are the shortest. The length of the shortest intervals based on the hard-thresholding estimator is larger than the length of the shortest interval based on the adaptive LASSO, which is larger than the length of the shortest interval based on the LASSO, which in turn is larger than the standard interval based on the maximum likelihood estimator. In the case where the penalized estimators are tuned to possess the `sparsity property', the intervals based on these estimators are larger than the standard interval by an order of magnitude. Furthermore, a simple asymptotic confidence interval construction in the `sparse' case, that also applies to the smoothly clipped absolute deviation estimator, is discussed. The results for the known-variance case are shown to carry over to the unknown-variance case in an appropriate asymptotic sense.


Random Indexing K-tree

arXiv.org Artificial Intelligence

The purpose of this paper is to present and analyse the combination of Random Indexing (RI) with the K-tree algorithm. Both RI and K-tree adapt to changing data and decrease the cost of computationally intensive vector based applications. This combination is particularly suitable to the representation and clustering of very large document collections. Documents are typically represented in vector space as very sparse high dimensional vectors. RI can reduce the dimensionality and sparsity of this representation. In turn, the condensed representation is highly effective when working with K-tree. The paper is focused on determining the effectiveness of using RI with K-tree through experiments and comparative analysis of results. Sections 2 to 6 discuss K-tree, Random Indexing, Document Representation, Experimental Setup and Experimental results respectively. The paper ends with a conclusion in Section 7.


Dendritic Cells for SYN Scan Detection

arXiv.org Artificial Intelligence

Artificial immune systems have previously been applied to the problem of intrusion detection. The aim of this research is to develop an intrusion detection system based on the function of Dendritic Cells (DCs). DCs are antigen presenting cells and key to activation of the human immune system, behaviour which has been abstracted to form the Dendritic Cell Algorithm (DCA). In algorithmic terms, individual DCs perform multi-sensor data fusion, asynchronously correlating the the fused data signals with a secondary data stream. Aggregate output of a population of cells, is analysed and forms the basis of an anomaly detection system. In this paper the DCA is applied to the detection of outgoing port scans using TCP SYN packets. Results show that detection can be achieved with the DCA, yet some false positives can be encountered when simultaneously scanning and using other network services. Suggestions are made for using adaptive signals to alleviate this uncovered problem.


Classifying the typefaces of the Gutenberg 42-line bible

arXiv.org Machine Learning

We have measured the dissimilarities among several printed characters of a single page in the Gutenberg 42-line bible and we prove statistically the existence of several different matrices from which the metal types where constructed. This is in contrast with the prevailing theory, which states that only one matrix per character was used in the printing process of Gutenberg's greatest work. The main mathematical tool for this purpose is cluster analysis, combined with a statistical test for outliers. We carry out the research with two letters, i and a. In the first case, an exact clustering method is employed; in the second, with more specimens to be classified, we resort to an approximate agglomerative clustering method. The results show that the letters form clusters according to their shape, with significant shape differences among clusters, and allow to conclude, with a very small probability of error, that indeed the metal types used to print them were cast from several different matrices. Mathematics Subject Classification: 62H30


Hilbert space embeddings and metrics on probability measures

arXiv.org Machine Learning

A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as $\gamma_k$, indexed by the kernel function $k$ that defines the inner product in the RKHS. We present three theoretical properties of $\gamma_k$. First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g. on compact domains), and are difficult to check, our conditions are straightforward and intuitive: bounded continuous strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on $\bb{R}^d$, then it is characteristic if and only if the support of its Fourier transform is the entire $\bb{R}^d$. Second, we show that there exist distinct distributions that are arbitrarily close in $\gamma_k$. Third, to understand the nature of the topology induced by $\gamma_k$, we relate $\gamma_k$ to other popular metrics on probability measures, and present conditions on the kernel $k$ under which $\gamma_k$ metrizes the weak topology.