IBM is teaming up with eight North American universities to further tune its cognitive system to tackle cybersecurity problems. Watson for Cyber Security, a platform already in pre-beta, will be further trained in "learning the nuances of security research findings and discovering patterns and evidence of hidden cyber attacks and threats that could otherwise be missed". IBM will work with eight US universities from autumn onwards for a year in order to push forward the project. The universities selected are California State Polytechnic University, Pomona; Pennsylvania State University; Massachusetts Institute of Technology; New York University; the University of Maryland, Baltimore County (UMBC); the University of New Brunswick; the University of Ottawa; and the University of Waterloo. The project is ultimately designed to bridge the cyber-security skills gap, a perennial issue in the industry.
IBM has launched the Watson for Cyber Security beta program to encourage companies to include Watson in their current security environments. Starting off with such organizations as California Polytechnic State University, Sumitomo Mitsui Banking Corporation, and University of Rochester Medical Center, the program will grow over the next few weeks to encompass 40 companies spanning industries like banking, travel, energy, automotive, health care, insurance, and education. For the past few months, IBM Security has been working with eight universities -- California State Polytechnic University at Pomona, Penn State, MIT, New York University, University of Maryland at Baltimore County, and Canada's universities of New Brunswick, Ottawa, and Waterloo -- to help teach Watson the "language of cybersecurity." The research project involved feeding Watson's AI brain thousands of documents annotated to help the system understand what a threat is, what it does, and what indicators are related. Watson for Cyber Security combines machine learning and natural language processing to make associations in unstructured data like blogs, research reports, and documentation that security analysts can then use to make better, faster decisions.
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
Lee, Sungjin (Pohang University of Science and Technology (POSTECH)) | Noh, Hyungjong (Pohang University of Science and Technology (POSTECH)) | Lee, Kyusong (Pohang University of Science and Technology (POSTECH)) | Lee, Gary Geunbae (Pohang University of Science and Technology (POSTECH))
The demand for computer-assisted language learning systems that can provide corrective feedback on language learners’ speaking has increased. However, it is not a trivial task to detect grammatical errors in oral conversations because of the unavoidable errors of automatic speech recognition systems. To provide corrective feedback, a novel method to detect grammatical errors in speaking performance is proposed. The proposed method consists of two sub-models: the grammaticality-checking model and the error-type classification model. We automatically generate grammatical errors that learners are likely to commit and construct error patterns based on the articulated errors. When a particular speech pattern is recognized, the grammaticality-checking model performs a binary classification based on the similarity between the error patterns and the recognition result using the confidence score. The error-type classification model chooses the error type based on the most similar error pattern and the error frequency extracted from a learner corpus. The grammaticality checking method largely outperformed the two comparative models by 56.36% and 42.61% in F-score while keeping the false positive rate very low. The error-type classification model exhibited very high performance with a 99.6% accuracy rate. Because high precision and a low false positive rate are important criteria for the language-tutoring setting, the proposed method will be helpful for intelligent computer-assisted language learning systems.
Children are facile at both discovering word boundaries and using those words to build higher-level structures in tandem. Current research treats lexical acquisition and grammar induction as two distinct tasks. Doing so has led to unreasonable assumptions. Existing work in grammar induction presupposes a perfectly segmented, noise-free lexicon, while lexical learning approaches largely ignore how the lexicon is used. This paper combines both tasks in a novel framework for bootstrapping lexical acquisition and grammar induction.