IBM has launched the Watson for Cyber Security beta program to encourage companies to include Watson in their current security environments. Starting off with such organizations as California Polytechnic State University, Sumitomo Mitsui Banking Corporation, and University of Rochester Medical Center, the program will grow over the next few weeks to encompass 40 companies spanning industries like banking, travel, energy, automotive, health care, insurance, and education. For the past few months, IBM Security has been working with eight universities -- California State Polytechnic University at Pomona, Penn State, MIT, New York University, University of Maryland at Baltimore County, and Canada's universities of New Brunswick, Ottawa, and Waterloo -- to help teach Watson the "language of cybersecurity." The research project involved feeding Watson's AI brain thousands of documents annotated to help the system understand what a threat is, what it does, and what indicators are related. Watson for Cyber Security combines machine learning and natural language processing to make associations in unstructured data like blogs, research reports, and documentation that security analysts can then use to make better, faster decisions.
Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, meanings. This paper explores the learnability of syntax (i.e.
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
For each word to be learned, our system a) creates a corpus of sentences, derived from the web, containing this word; b) automatically semantically annotates the corpus using the OntoSem semantic analyzer; c) creates a candidate new concept by collating semantic information from annotated sentences; and d) finds in the existing ontology concept(s) "closest" to the candidate. In the long term, our approach is intended to support the continual mutual bootstrapping of the learner and the semantic analyzer as a solution to the knowledge acquisition bottleneck problem in AI.
Sleeman, Jennifer (University of Maryland, Baltimore County) | Halem, Milton (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County) | Cane, Mark (Columbia University)
Climate change is an important social issue and the subject of much research, both to understand the history of the Earth's changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.