Goto

Collaborating Authors

 Petrovic, Milenko


Discovering and Characterizing Emerging Events in Big Data

AAAI Conferences

We describe a novel system for discovering and characterizing emerging events. We define event emergence to be a developing situation comprised of a series of sub-events. To detect sub-events from a very large, continuous textual input stream, we use two techniques: (1) frequency-based detection of sub-events that are potentially entailed by an emerging event; and (2) anomaly-based detection of other sub-events that are potentially indicative of an emerging event. Identifying emerging events from detected sub-events involves connecting sub-events to each other and to the relevant emerging events within the event models and estimating the likelihood of possible emerging events. Each sub-event can be part of a number of emerging events and supports various event models to varying degrees. We adopt a coherent and compact model that probabilistically identifies emerging events. The innovative aspect of our work is a well-defined framework where statistical Big Data techniques are informed by event semantics and inference techniques (and vice versa). Our work is strongly grounded in semantics and knowledge representation, which enables us to produce more reliable results than would otherwise be possible with a purely statistical approach.


SemMemDB: In-Database Knowledge Activation

AAAI Conferences

Semantic networks are a popular way of simulating human memory in ACT-R-like cognitive architectures. However, existing implementations fall short in their ability to efficiently work with very large networks required for full-scale simulations of human memories. In this paper, we present SemMemDB, an in-database realization of semantic networks and spreading activation. We describe a relational representation for semantic networks and an efficient SQL-based spreading activation algorithm. We provide a simple interface for users to invoke retrieval queries. The key benefits of our approach are: (1) Databases have mature query engines and optimizers that generate efficient query plans for memory activation and retrieval; (2) Databases can provide massive storage capacity to potentially support human-scale memories; (3) Spreading activation is implemented in SQL, a widely-used query language for big data analytics. We evaluate SemMemDB in a comprehensive experimental study using DBPedia, a web-scale ontology constructed from the Wikipedia corpus. The results show that our system runs over 500 times faster than previous works.