In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.
Thousands of ancient treasures that have been unearthed by climate change could soon be lost to humankind forever, as they are eroded by weathering and eaten by pests. The crisis is so extreme that some archaeologists are urging colleagues to abandon their current field sites and focus instead on these newly exposed relics before they vanish. Rising seas, raging storms, melting ice and forest fires are revealing artefacts that have much to tell us about our history on Earth – from sunken shipwrecks in Svalbard to the ancient waste dumps filled with bones, shoes and carvings emerging all over the Arctic and further south, including in Scotland. "This material is like the library of Alexandria. It is incredibly valuable and it's on fire now," George Hambrecht, an anthropologist at the University of Maryland, College Park, told New Scientist at the Anthropology, Weather and Climate Change conference held in London last month.
Wireless sensor networks are composed of a number of sensors probing their surroundings and disseminating the collected data to a gateway node for processing. Numerous military and civil applications have emerged over the last few years for wireless sensor networks. In many of these applications, sensors and gateways are placed in harsh environments. Therefore, protecting snsors and the gateway is critical for ensuring the robustness of the network operation. Gateway relocation has been pursued as means for boosting network-related performance metrics, such as throughput and energy consumption. However, we argue that relocating without taking safety concerns into consideration may cause the gateway to move dangerously close to one or multiple serious events in the environment.
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
For each word to be learned, our system a) creates a corpus of sentences, derived from the web, containing this word; b) automatically semantically annotates the corpus using the OntoSem semantic analyzer; c) creates a candidate new concept by collating semantic information from annotated sentences; and d) finds in the existing ontology concept(s) "closest" to the candidate. In the long term, our approach is intended to support the continual mutual bootstrapping of the learner and the semantic analyzer as a solution to the knowledge acquisition bottleneck problem in AI.