IBM is teaming up with eight North American universities to further tune its cognitive system to tackle cybersecurity problems. Watson for Cyber Security, a platform already in pre-beta, will be further trained in "learning the nuances of security research findings and discovering patterns and evidence of hidden cyber attacks and threats that could otherwise be missed". IBM will work with eight US universities from autumn onwards for a year in order to push forward the project. The universities selected are California State Polytechnic University, Pomona; Pennsylvania State University; Massachusetts Institute of Technology; New York University; the University of Maryland, Baltimore County (UMBC); the University of New Brunswick; the University of Ottawa; and the University of Waterloo. The project is ultimately designed to bridge the cyber-security skills gap, a perennial issue in the industry.
IBM has launched the Watson for Cyber Security beta program to encourage companies to include Watson in their current security environments. Starting off with such organizations as California Polytechnic State University, Sumitomo Mitsui Banking Corporation, and University of Rochester Medical Center, the program will grow over the next few weeks to encompass 40 companies spanning industries like banking, travel, energy, automotive, health care, insurance, and education. For the past few months, IBM Security has been working with eight universities -- California State Polytechnic University at Pomona, Penn State, MIT, New York University, University of Maryland at Baltimore County, and Canada's universities of New Brunswick, Ottawa, and Waterloo -- to help teach Watson the "language of cybersecurity." The research project involved feeding Watson's AI brain thousands of documents annotated to help the system understand what a threat is, what it does, and what indicators are related. Watson for Cyber Security combines machine learning and natural language processing to make associations in unstructured data like blogs, research reports, and documentation that security analysts can then use to make better, faster decisions.
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, meanings. This paper explores the learnability of syntax (i.e.
In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.