Data Mining
A Linear Programming Approach to Novelty Detection
Campbell, Colin, Bennett, Kristin P.
Novelty detection involves modeling the normal behaviour of a system henceenabling detection of any divergence from normality. It has potential applications in many areas such as detection of machine damageor highlighting abnormal features in medical data. One approach is to build a hypothesis estimating the support of the normal data i.e. constructing a function which is positive in the region where the data is located and negative elsewhere. Recently kernel methods have been proposed for estimating the support of a distribution and they have performed well in practice - training involves solution of a quadratic programming problem. In this paper wepropose a simpler kernel method for estimating the support based on linear programming. The method is easy to implement and can learn large datasets rapidly. We demonstrate the method on medical and fault detection datasets.
Building Intelligent Learning Database Systems
Induction and deduction are two opposite operations in data-mining applications. Induction extracts knowledge in the form of, say, rules or decision trees from existing data, and deduction applies induction results to interpret new data. An intelligent learning database (ILDB) system integrates machine-learning techniques with database and knowledge base technology. It starts with existing database technology and performs both induction and deduction. The integration of database technology, induction (from machine learning), and deduction (from knowledge-based sys-tems) plays a key role in the construction of ILDB systems, as does the design of efficient induction and deduction algorithms. This article presents a system structure for ILDB systems and discusses practical issues for ILDB applications, such as instance selection and structured induction.
Familiarity Discrimination of Radar Pulses
Granger, Eric, Grossberg, Stephen, Rubin, Mark A., Streilein, William W.
H3C 3A 7 CAN ADA 2Department of Cognitive and Neural Systems, Boston University Boston, MA 02215 USA Abstract The ARTMAP-FD neural network performs both identification (placing test patterns in classes encountered during training) and familiarity discrimination (judging whether a test pattern belongs to any of the classes encountered during training). The performance of ARTMAP-FD is tested on radar pulse data obtained in the field, and compared to that of the nearest-neighbor-based NEN algorithm and to a k 1 extension of NEN. 1 Introduction The recognition process involves both identification and familiarity discrimination. Consider, for example, a neural network designed to identify aircraft based on their radar reflections and trained on sample reflections from ten types of aircraft A... J. After training, the network should correctly classify radar reflections belonging to the familiar classes A... J, but it should also abstain from making a meaningless guess when presented with a radar reflection from an object belonging to a different, unfamiliar class. Familiarity discrimination is also referred to as "novelty detection," a "reject option," and "recognition in partially exposed environments."
The NASD Regulation Advanced-Detection System (ADS)
Kirkland, J. Dale, Senator, Ted E., Hayden, James J., Dybala, Tomasz, Goldberg, Henry G., Shyr, Ping
The National Association of Securities Dealers, Inc., regulation advanced-detection system (ADS) monitors trades and quotations in The Nasdaq Stock Market to identify patterns and practices of behavior of potential regulatory interest. ADS has been in operational use at NASD Regulation since the summer of 1997 by several groups of analysts, processing approximately 2 million transactions a day, generating over 10,000 breaks. More important, it has greatly expanded surveillance coverage to new areas of the market and to many new types of behavior of regulatory concern. ADS combines detection and discovery components in a single system that supports multiple regulatory domains and shares the same market data. ADS makes use of a variety of AI techniques, including visualization, pattern recognition, and data mining, in support of the activities of regulatory analysis, alert and pattern detection, and knowledge discovery.
The Distributed Data-Mining Worksho
Kargupta, Hillol, Chan, Philip
Victor Lesser (University of Massachusetts at Amherst) gave an invited talk on distributed interpretation and its of Hong Kong Polytechnic University, possible implication in DDM. Mining, brought interested researchers (Brigham Young University) and Salvatore The paper sessions ended with two and practitioners together and created Stolfo (Columbia University) working paper presentations by Billy an environment for crystallizing the presented the effects of class distribution Wallace and Juan Botia, Marcedes Garijo, fast-growing field of DDM. The concluding session was the panel Lawrence Hall, Nitesh Chawla, and 40 participants attended the workshop. Stolfo, George Cybenko Kevin W. Bowyer (all of University of The workshop had 13 presentations, Stolfo stressed suggested different techniques for Cybenko of Dartmouth University. Organizers sincerely hope that the session.
Hybrid NN/HMM-Based Speech Recognition with a Discriminant Neural Feature Extraction
Willett, Daniel, Rigoll, Gerhard
In this paper, we present a novel hybrid architecture for continuous speech recognition systems. It consists of a continuous HMM system extended by an arbitrary neural network that is used as a preprocessor that takes several frames of the feature vector as input to produce more discriminative feature vectors with respect to the underlying HMM system. This hybrid system is an extension of a state-of-the-art continuous HMM system, and in fact, it is the first hybrid system that really is capable of outperforming these standard systems with respect to the recognition accuracy. Experimental results show an relative error reduction of about 10% that we achieved on a remarkably good recognition system based on continuous HMMs for the Resource Management 1 OOO-word continuous speech recognition task.