Performance Analysis
Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection
Farid, Dewan Md., Harbi, Nouria, Rahman, Mohammad Zahidur
In this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR) and significant reduce false positives (FP) for different types of network intrusions using limited computational resources.
Widespread Worry and the Stock Market
Gilbert, Eric (University of Illinois at Urbana-Champaign) | Karahalios, Karrie (University of Illinois at Urbana-Champaign)
Our emotional state influences our choices. Research on how it happens usually comes from the lab. We know relatively little about how real world emotions affect real world settings, like financial markets. Here, we demonstrate that estimating emotions from weblogs provides novel information about future stock market prices. That is, it provides information not already apparent from market data. Specifically, we estimate anxiety, worry and fear from a dataset of over 20 million posts made on the site LiveJournal. Using a Granger-causal framework, we find that increases in expressions of anxiety, evidenced by computationally-identified linguistic features, predict downward pressure on the S&P 500 index. We also present a confirmation of this result via Monte Carlo simulation. The findings show how the mood of millions in a large online community, even one that primarily discusses daily life, can anticipate changes in a seemingly unrelated system. Beyond this, the results suggest new ways to gauge public opinion and predict its impact.
Study of Static Classification of Social Spam Profiles in MySpace
Irani, Danesh (Georgia Institute of Technology) | Webb, Steve (Georgia Institute of Technology) | Pu, Calton (Georgia Institute of Technology)
Reaching hundreds of millions of users, major social networks have become important target media for spammers. Although practical techniques such as collaborative filters and behavioral analysis are able to reduce spam, they have an inherent lag (to collect sufficient data on the spammer) that also limits their effectiveness. Through an experimental study of over 1.9 million MySpace profiles, we make a case for analysis of static user profile content, possibly as soon as such profiles are created. We compare several machine learning algorithms in their ability to distinguish spam profiles from legitimate profiles. We found that a C4.5 decision tree algorithm achieves the highest accuracy (99.4%) of finding rogue profiles, while naรฏve Bayes achieves a lower accuracy (92.6%). We also conducted a sensitivity analysis of the algorithms w.r.t. features which may be easily removed by spammers.
A Unifying View of Multiple Kernel Learning
Kloft, Marius, Rรผckert, Ulrich, Bartlett, Peter L.
Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.
The Application of a Dendritic Cell Algorithm to a Robotic Classifier
Oates, Robert, Greensmith, Julie, Aickelin, Uwe, Garibaldi, Jonathan M., Kendall, Graham
The dendritic cell algorithm is an immune-inspired technique for processing time-dependant data. Here we propose it as a possible solution for a robotic classification problem. The dendritic cell algorithm is implemented on a real robot and an investigation is performed into the effects of varying the migration threshold median for the cell population. The algorithm performs well on a classification task with very little tuning. Ways of extending the implementation to allow it to be used as a classifier within the field of robotic security are suggested.
PCA 4 DCA: The Application Of Principal Component Analysis To The Dendritic Cell Algorithm
Gu, Feng, Greensmith, Julie, Oates, Robert, Aickelin, Uwe
As one of the newest members in the field of artificial immune systems (AIS), the Dendritic Cell Algorithm (DCA) is based on behavioural models of natural dendritic cells (DCs). Unlike other AIS, the DCA does not rely on training data, instead domain or expert knowledge is required to predetermine the mapping between input signals from a particular instance to the three categories used by the DCA. This data preprocessing phase has received the criticism of having manually over-fitted the data to the algorithm, which is undesirable. Therefore, in this paper we have attempted to ascertain if it is possible to use principal component analysis (PCA) techniques to automatically categorise input data while still generating useful and accurate classification results. The integrated system is tested with a biometrics dataset for the stress recognition of automobile drivers. The experimental results have shown the application of PCA to the DCA for the purpose of automated data preprocessing is successful.
Introducing Dendritic Cells as a Novel Immune-Inspired Algorithm for Anomoly Detection
Greensmith, Julie, Aickelin, Uwe, Cayzer, Steve
Dendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system. Research into this family of cells has revealed that they perform the role of coordinating T-cell based immune responses, both reactive and for generating tolerance. We have derived an algorithm based on the functionality of these cells, and have used the signals and differentiation pathways to build a control mechanism for an artificial immune system. We present our algorithmic details in addition to some preliminary results, where the algorithm was applied for the purpose of anomaly detection. We hope that this algorithm will eventually become the key component within a large, distributed immune system, based on sound immunological concepts.
Experimenting with Innate Immunity
Twycross, Jamie, Aickelin, Uwe
In a previous paper the authors argued the case for incorporating ideas from innate immunity into artificial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were highlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system.
Algebraic Comparison of Partial Lists in Bioinformatics
Jurman, Giuseppe, Riccadonna, Samantha, Visintainer, Roberto, Furlanello, Cesare
The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset.
Case for Automated Detection of Diabetic Retinopathy
Silberman, Nathan (New York University) | Ahrlich, Kristy (New York University) | Fergus, Rob (New York University) | Subramanian, Lakshminarayanan
Diabetic retinopathy, an eye disorder caused by diabetes, is the primary cause of blindness in America and over 99% of cases in India. India and China currently account for over 90 million diabetic patients and are on the verge of an explosion of diabetic populations. This may result in an unprecedented number of persons becoming blind unless diabetic retinopathy can be detected early. Aravind Eye Hospitals is the largest eye care facility in the world, handling over 2 million patients per year. The hospital is on a massive drive throughout southern India to detect diabetic retinopathy at an early stage. To that end, a group of 10-15 physicians are responsible for manually diagnosing over 2 million retinal images per year to detect diabetic retinopathy. While the task is extremely laborious, a large fraction of cases turn out to be normal indicating that much of this time is spent diagnosing completely normal cases. This paper describes our early experiences working with Aravind Eye Hospitals to develop an automated system to detect diabetic retinopathy from retinal images. The automated diabetic retinopathy problem is a hard computer vision problem whose goal is to detect features of retinopathy, such as hemorrhages and exudates, in retinal color fundus images. We describe our initial efforts towards building such a system using a range of computer vision techniques and discuss the potential impact on early detection of diabetic retinopathy.