Ontologies
Persistence, Change, and the Integration of Objects and Processes in the Framework of the General Formal Ontology
In this paper we discuss various problems, associated to temporal phenomena. These problems include persistence and change, the integration of objects and processes, and truth-makers for temporal propositions. We propose an approach which interprets persistence as a phenomenon emanating from the activity of the mind, and which, additionally, postulates that persistence, finally, rests on personal identity. The General Formal Ontology (GFO) is a top level ontology being developed at the University of Leipzig. Top level ontologies can be roughly divided into 3D-ontologies, and 4D-ontologies. GFO is the only top level ontology, used in applications, which is a 4D-ontology admitting additionally 3D objects. Objects and processes are integrated in a natural way.
Mapping cognitive ontologies to and from the brain
Schwartz, Yannick, Thirion, Bertrand, Varoquaux, Gaël
Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied by the activation of brain regions, it is necessary to combine a wide exploration of the various brain functions and some inversion of the statistical inference. Here we introduce a methodology for accumulating knowledge towards a bidirectional link between observed brain activity and the corresponding function. We rely on a large corpus of imaging studies and a predictive engine. Technically, the challenges are to find commonality between the studies without denaturing the richness of the corpus. The key elements that we contribute are labeling the tasks performed with a cognitive ontology, and modeling the long tail of rare paradigms in the corpus. To our knowledge, our approach is the first demonstration of predicting the cognitive content of completely new brain images. To that end, we propose a method that predicts the experimental paradigms across different studies.
Semantics for Big Data Integration and Analysis
Knoblock, Craig A. (University of Southern California) | Szekely, Pedro (University of Southern California)
Much of the focus on big data has been on the problem of processing very large sources. There is an equally hard problem of how to normalize, integrate, and transform the data from many sources into the format required to run large-scale analysis and visualization tools. We have previously developed an approach to semi-automatically mapping diverse sources into a shared domain ontology so that they can be quickly combined. In this paper we describe our approach to building and executing integration and restructuring plans to support analysis and visualization tools on very large and diverse datasets.
Unsupervised Rating Prediction based on Local and Global Semantic Models
Boteanu, Adrian (Worcester Polytechnic Institute) | Chernova, Sonia (Worcester Polytechnic Institute)
Current recommendation engines attempt to answer the same question: given a user with some activity in the system, which is the next entity, be it a restaurant, a book or a movie, that the user should visit or buy next. The presumption is that the user would favorably review the item being recommended. The goal of our project is to predict how a user would rate an item he/she never rated, which is a generalization of the task recommendation engines perform. Previous work successfully employs machine learning techniques, particularly statistical methods. However, there are some outlier situations which are more difficult to predict, such as new users. In this paper we present a rating prediction approach targeted for entities for which little prior information exists in the database.We put forward and test a number of hypotheses, exploring recommendations based on nearest neighbor-like methods. We adapt existing common sense topic modeling methods to compute similarity measures between users and then use a relatively small set of key users to predict how the target user will rate a given business. We implemented and tested our system for recommending businesses using the Yelp Academic Dataset. We report initial results for topic-based rating predictions, which perform consistently across a broad range of parameters.
Developing Semantic Classifiers for Big Data
Scherl, Richard (Monmouth University)
When the amount of RDF data is very large, it becomes more likely that the triples describing entities will contain errors and may not include the specification of a class from a known ontology. The work presented here explores the utilization of methods from machine learning to develop classifiers for identifying the semantic categorization of entities based upon the property names used to describe the entity. The goal is to develop classifiers that are accurate, but robust to errors and noise. The training data comes from DBpedia, where entities are categorized by type and densely described with RDF properties. The initial experimentation reported here indicates that the approach is promising.
Entity Type Recognition for Heterogeneous Semantic Graphs
Sleeman, Jennifer (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County)
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic graphs, whether in RDF or another formalism, are semi-structured data with very different contextual clues and need different approaches to identify potentially coreferent entities. When their ontologies are unknown, inaccessible or semantically trivial, coreference resolution is difficult. For such cases, we can use supervised machine learning to map entity attributes via dictionaries based on properties from an appropriate background knowledge base to predict instance entity types, aiding coreference resolution. We evaluated the approach in experiments on data from Wikipedia, Freebase and Arnetminer and DBpedia as the background knowledge base.
Unsupervised learning human's activities by overexpressed recognized non-speech sounds
Smidtas, Serge, Peyrot, Magalie
Human activity and environment produces sounds such as, at home, the noise produced by water, cough, or television. These sounds can be used to determine the activity in the environment. The objective is to monitor a person's activity or determine his environment using a single low cost microphone by sound analysis. The purpose is to adapt programs to the activity or environment or detect abnormal situations. Some patterns of over expressed repeatedly in the sequences of recognized sounds inter and intra environment allow to characterize activities such as the entrance of a person in the house, or a tv program watched. We first manually annotated 1500 sounds of daily life activity of old persons living at home recognized sounds. Then we inferred an ontology and enriched the database of annotation with a crowed sourced manual annotation of 7500 sounds to help with the annotation of the most frequent sounds. Using learning sound algorithms, we defined 50 types of the most frequent sounds. We used this set of recognizable sounds as a base to tag sounds and put tags on them. By using over expressed number of motifs of sequences of the tags, we were able to categorize using only a single low-cost microphone, complex activities of daily life of a persona at home as watching TV, entrance in the apartment of a person, or phone conversation including detecting unknown activities as repeated tasks performed by users.
Ontology Quality Assurance with the Crowd
Mortensen, Jonathan M. (Stanford University) | Musen, Mark A. (Stanford University) | Noy, Natalya F. (Stanford University)
The Semantic Web has the potential to change the Web as we know it. However, the community faces a significant challenge in managing, aggregating, and curating the massive amount of data and knowledge. Human computation is only beginning to serve an essential role in the curation of these Web-based data. Ontologies, which facilitate data integration and search, serve as a central component of the Semantic Web, but they are large, complex, and typically require extensive expert curation. Furthermore, ontology-engineering tasks require more knowledge than is required in a typical crowdsourcing-task. We have developed ontology-engineering methods that leverage the crowd. In this work, we describe our general crowdsourcing workflow. We then highlight our work on applying this workflow to ontology verification and quality assurance. In a pilot study, this method approaches expert ability, finding the same errors that experts identified with 86% accuracy in a faster and more scalable fashion. The work provides a general framework with which to develop crowdsourcing methods for the Semantic Web. In addition, it highlights opportunities for future research in human computation and crowdsourcing.
Spontaneous Analogy by Piggybacking on a Perceptual System
Most computational models of analogy assume they are given a delineated source domain and often a specified target domain. These systems do not address how analogs can be isolated from large domains and spontaneously retrieved from long-term memory, a process we call spontaneous analogy. We present a system that represents relational structures as feature bags. Using this representation, our system leverages perceptual algorithms to automatically create an ontology of relational structures and to efficiently retrieve analogs for new relational structures from long-term memory. We provide a demonstration of our approach that takes a set of unsegmented stories, constructs an ontology of analogical schemas (corresponding to plot devices), and uses this ontology to efficiently find analogs within new stories, yielding significant time-savings over linear analog retrieval at a small accuracy cost.
A modeling approach to design a software sensor and analyze agronomical features - Application to sap flow and grape quality relationship
Thébaut, Aurélie, Scholash, Thibault, Charnomordic, Brigitte, Hilgert, Nadine
This work proposes a framework using temporal data and domain knowledge in order to analyze complex agronomical features. The expertise is first formalized in an ontology, under the form of concepts and relationships between them, and then used in conjunction with raw data and mathematical models to design a software sensor. Next the software sensor outputs are put in relation to product quality, assessed by quantitative measurements. This requires the use of advanced data analysis methods, such as functional regression. The methodology is applied to a case study involving an experimental design in French vineyards. The temporal data consist of sap flow measurements, and the goal is to explain fruit quality (sugar concentration and weight), using vine's water courses through the various vine phenological stages. The results are discussed, as well as the method genericity and robustness.