IBM is teaming up with eight North American universities to further tune its cognitive system to tackle cybersecurity problems. Watson for Cyber Security, a platform already in pre-beta, will be further trained in "learning the nuances of security research findings and discovering patterns and evidence of hidden cyber attacks and threats that could otherwise be missed". IBM will work with eight US universities from autumn onwards for a year in order to push forward the project. The universities selected are California State Polytechnic University, Pomona; Pennsylvania State University; Massachusetts Institute of Technology; New York University; the University of Maryland, Baltimore County (UMBC); the University of New Brunswick; the University of Ottawa; and the University of Waterloo. The project is ultimately designed to bridge the cyber-security skills gap, a perennial issue in the industry.
IBM has launched the Watson for Cyber Security beta program to encourage companies to include Watson in their current security environments. Starting off with such organizations as California Polytechnic State University, Sumitomo Mitsui Banking Corporation, and University of Rochester Medical Center, the program will grow over the next few weeks to encompass 40 companies spanning industries like banking, travel, energy, automotive, health care, insurance, and education. For the past few months, IBM Security has been working with eight universities -- California State Polytechnic University at Pomona, Penn State, MIT, New York University, University of Maryland at Baltimore County, and Canada's universities of New Brunswick, Ottawa, and Waterloo -- to help teach Watson the "language of cybersecurity." The research project involved feeding Watson's AI brain thousands of documents annotated to help the system understand what a threat is, what it does, and what indicators are related. Watson for Cyber Security combines machine learning and natural language processing to make associations in unstructured data like blogs, research reports, and documentation that security analysts can then use to make better, faster decisions.
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, meanings. This paper explores the learnability of syntax (i.e.
Sleeman, Jennifer (University of Maryland, Baltimore County) | Halem, Milton (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County) | Cane, Mark (Columbia University)
Climate change is an important social issue and the subject of much research, both to understand the history of the Earth's changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.