If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.
Joshi, Karuna P. (University of Maryland, Baltimore County) | Gupta, Aditi (University of Maryland, Baltimore County) | Mittal, Sudip (University of Maryland, Baltimore County) | Pearce, Claudia (University of Maryland, Baltimore County) | joshi, Anupam (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County)
In recent times, there has been an exponential growth in digitization of legal documents such as case records, contracts,terms of services, regulations, privacy documents and compliance guidelines. Courts have been digitizing their archivedcases and also making it available for e-discovery. On theother hand, businesses are now maintaining large data setsof legal contracts that they have signed with their employees,customers and contractors. Large public sector organizationsare often bound by complex legal legislation and statutes.Hence, there is a need of a cognitive assistant to analyze andreason over these legal rules and help people make decisions.Today the process of monitoring an ever increasing datasetof legal contracts and ensuring regulations and complianceis still very manual and labour intensive. This can prove tobe a bottleneck in the smooth functioning of an enterprise.Automating these digital workflows is quite hard because theinformation is available as text documents but it is not represented in a machine understandable way. With the advancements in cognitive assistance technologies, it is now possibleto analyze these digitized legal documents efficiently. In thispaper, we discuss ALDA, a legal cognitive assistant to analyze digital legal documents. We also present some of thepreliminary results we have obtained by analyzing legal documents using techniques such as semantic web, text miningand graph analysis.
Syed, Zareen (University of Maryland Baltimore County) | Padia, Ankur (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County) | Mathews, Lisa (University of Maryland, Baltimore County) | Joshi, Anupam (University of Maryland, Baltimore County)
In this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integratesheterogeneous data and knowledge schemas from different cybersecurity systems and most commonly usedcybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to anumber of existing cybersecurity ontologies as well asconcepts in the Linked Open Data cloud (Berners-Lee,Bizer, and Heath 2009). Similar to DBpedia (Auer etal. 2007) which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO toserve as the core for cybersecurity domain, which wouldevolve and grow with the passage of time with additional cybersecurity data sets as they become available.We also present a prototype system and concrete usecases supported by the UCO ontology. To the best of ourknowledge, this is the first cybersecurity ontology thathas been mapped to general world ontologies to support broader and diverse security use cases. We comparethe resulting ontology with previous efforts, discuss itsstrengths and limitations, and describe potential futurework directions.
Zavala, Laura (Medgar Evers College, City University of New York) | Murukannaiah, Pradeep K. (North Carolina State University) | Poosamani, Nithyananthan (North Carolina State University.) | Finin, Tim (University of Maryland, Baltimore County) | Joshi, Anupam (University of Maryland, Baltimore County) | Rhee, Injong (North Carolina State University, Raleigh) | Singh, Munindar P. (North Carolina State University)
The Platys project focuses on developing a high-level, semantic notion of location called place. A place, unlike a geospatial position, derives its meaning from a user’s actions and interactions in addition to the physical location where they occur. Our aim is to enable the construction of a large variety of applications that take advantage of place to render relevant content and functionality and thus, improve user experience. We consider elements of context that are particularly related to mobile computing. The main problems we have addressed to realize our place-oriented mobile computing vision, are representing places, recognizing places, engineering place-aware applications. We describe the approaches we have developed for addressing these problems and related subproblems. A key element of our work is the use of collaborative information sharing where users’ devices share and integrate knowledge about places. Our place ontology facilitates such collaboration. Declarative privacy policies allow users to specify contextual features under which they prefer to share or not share their information.
We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.
We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.
Wild Big Data (WBD) is data that is hard to extract, understand, and use due to its heterogeneous nature and volume. It typically comes without a schema, is obtained from multiple sources and provides a challenge for information extraction and integration. We describe a way to subduing WBD that uses techniques and resources that are popular for processing natural language text. The approach is applicable to data that is presented as a graph of objects and relations between them and to tabular data that can be transformed into such a graph. We start by applying topic models to contextualize the data and then use the results to identify the potential types of the graph's nodes by mapping them to known types found in large open ontologies such as Freebase, and DBpedia. The results allow us to assemble coarse clusters of objects that can then be used to interpret the link and perform entity disambiguation and record linking.
Mayfield, James (Johns Hopkins Applied Physics Laboratory) | McNamee, Paul (Johns Hopkins Applied Physics Laboratory) | Harman, Craig (Johns Hopkins University) | Finin, Tim (University of Maryland, Baltimore County) | Lawrie, Dawn (Loyola University Maryland)
We describe the KELVIN system for extracting entities and relations from large text collections and its use in the TAC Knowledge Base Population Cold Start task run by the U.S. National Institute of Standards and Technology. The Cold Start task starts with an empty knowledge base defined by an ontology or entity types, properties and relations. Evaluations in 2012 and 2013 were done using a collection of text from local Web and news to de-emphasize the linking entities to a background knowledge bases such as Wikipedia. Interesting features of KELVIN include a cross-document entity coreference module based on entity mentions, removal of suspect intra-document conference chains, a slot value consolidator for entities, the application of inference rules to expand the number of asserted facts and a set of analysis and browsing tools supporting development.
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
One way to obtain large amounts of semantic data is to extract facts from the vast quantities of text that is now available on-line. The relatively low accuracy of current information extraction techniques introduces a need for evaluating the quality of the knowledge bases (KBs) they generate. We frame the problem as comparing KBs generated by different systems from the same documents and show that exploiting provenance leads to more efficient techniques for aligning them and identifying their differences. We describe two types of tools: entity-match focuses on differences in entities found and linked; kbdiff focuses on differences in relations among those entities. Together, these tools support assessment of relative KB accuracy by sampling the parts of two KBs that disagree. We explore the usefulness of the tools through the construction of tens of different KBs built from the same 26,000 Washington Post articles and identifying the differences.