Stahl, Ken (State University of New York - University at Albany) | Shaikh, Samira (State University of New York - University at Albany) | Strzalkowski, Tomek (State University of New York - University at Albany)
Semantic and syntactic features found in text can be used in combination to statistically predict linguistic devices such as hedges in online chat. Some features are better indicators than others, and there are cases when multiple features need to be considered together to be useful. Once the features are identified, it becomes an optimization problem to find the best division of data. We have devised a genetic algorithm approach towards detecting hedges in online multi-party chat discourse. A system was created using rewards and penalties for matching features in tokenized text, so optimizing the reward and penalty amounts are the main challenge. Genetic algorithms, a subset of Evolutionary Algorithms, are great for optimization; as they are massively parallel directed searches, and therefore suited to finding the best ratio of integer rewards and penalties. “Evolutionary algorithms (EAs) utilize principles of natural selection and are robust adaptive search schemes suitable for searching nonlinear, discontinuous, and high-dimensional spaces. This class of algorithms is being increasingly applied to obtain optimal or near-optimal solutions to many complex real-world optimization problems” (Bonissone, et. al. 2006) We show results using 10-fold cross validation as commonly used in traditional machine learning. The best performance without further fine tuning is 79% in classifying whether an utterance in chat contains a hedge or not.
Developing methods that can identify important patterns in complex large-scale temporal datasets is one of the key challenges in machine learning and data mining research. Our work focuses on the development of methods that can, based on past data, identify unusual patient-management actions in the Electronic Medical Record (EMR) of the current patient and raise alerts if such actions are encountered. We developed and evaluated a conditional-outlier detection approach for identifying clinical actions such as omissions of medication orders or laboratory orders in the intensive care unit (ICU) that are unusual with respect to past patient care. We used data from 24,658 ICU patient admissions to first learn the outlier models and then to generate 240 medication and laboratory omission alerts. The alerts were evaluated by a group of 18 intensive care physicians. The results show the true positive alert rate for all study alerts ranged from 0.42 to 0.53, which is promising and compares favorably to the positive alert rates of existing clinical alerting systems.
Distributed Denial of Service (DDoS) attacks are a major threat to Internet security, with their use continuing to grow. Attackers are finding more sophisticated methods to attack servers. A lot of defense mechanisms have been proposed for DDoS attacks at IP and TCP layers. Those methods will not work well for application layer DDoS attacks that utilize legitimate application layer requests to overwhelm a webserver. These attacks look legitimate in both packets and protocol characteristics, which makes them harder to detect. In this paper, we propose an anomaly detection method to detect application layer DDoS attacks. We take a text mining approach to extract features which represent a user’s HTTP request sequence using bigrams. We apply the one class Support Vector Machine (SVM) algorithm on the extracted features from normal users’ HTTP request sequences. The one class SVM labels any newly seen instance that deviates from the normal, trained model as an application layer DDoS instance. We apply our experimental analysis on real web server logs collected from a student resource website. Three different variants of HTTP GET flood attacks are implemented on our server, generated via penetration testing. Our results show that the proposed method is able to detect application layer DDoS attacks with very good performance results.
Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care claims are complex as they involve multiple parties including service providers, insurance subscribers, and insurance carriers. Medicare is susceptible to fraud because of this complexity. To build a comprehensive fraud detection system, one must take into consideration all of the financial practices involved among the associated parties. This paper is focused on graph-based analysis of CMS provided Medicare claims data to look for anomalies in the relationships and transactions among patients, service providers, claims, physicians, diagnosis, and procedures. In our experiments, we create graphs from inpatient, outpatient, and carrier claims data of the beneficiary. We then demonstrate the potential effectiveness of applying graph-based anomaly detection to the problem of discovering anomalies and potential fraud scenarios.