University of California, Riverside


Hawkes Process Inference With Missing Data

AAAI Conferences

A multivariate Hawkes process is a class of marked point processes: A sample consists of a finite set of events of unbounded random size; each event has a real-valued time and a discrete-valued label (mark). It is self-excitatory: Each event causes an increase in the rate of other events (of either the same or a different label) in the (near) future. Prior work has developed methods for parameter estimation from complete samples. However, just as unobserved variables can increase the modeling power of other probabilistic models, allowing unobserved events can increase the modeling power of point processes. In this paper we develop a method to sample over the posterior distribution of unobserved events in a multivariate Hawkes process. We demonstrate the efficacy of our approach, and its utility in improving predictive power and identifying latent structure in real-world data.


Querying Documents Annotated by Interconnected Entities

AAAI Conferences

In a large number of applications, from biomedical literature to social networks, there are collections of text documents that are annotated by interconnected entities, which are related to each other through association graphs. For example, social posts are related through the friendship graph of their authors, and PubMed articles area annotated by Mesh terms, which are related through ontological relationships. To effectively query such collections, in addition to the text content relevance of a document, the semantic distance between the entities of a document and the query must be taken into account. In this paper, we propose a novel query framework, which we refer as keyword querying on graph-annotated documents, and query techniques to answer such queries. Our methods automatically balance the impact of the graph entities and the text content in the ranking. Our qualitative evaluation on real dataset shows that our methods improve the ranking quality compared to baseline ranking systems.


Answering Complex Queries in an Online Community Network

AAAI Conferences

An online community network such as Twitter or amazon.com links entities (e.g., users, products) with various relationships (e.g., friendship, co-purchase) and make such information available for access through a web interface. The web interfaces of these networks often support features such as keyword search and "get-neighbors" — so a visitor can quickly find entities (e.g., users/products) of interest. Nonetheless, the interface is usually too restrictive to answer complex queries such as (1) find 100 Twitter users from California with at least 100 followers who talked about ICWSM last year or (2) find 100 books with at least 200 5-star reviews at amazon.com. In this paper, we introduce the novel problem of answering complex queries that involve non-searchable attributes through the web interface of an online community network. We model such a network as a heterogeneous graph with two access channels, Content Search and Local Search. We propose a unified approach that transforms the complex query into a small number of supported ones based on a strategic query-selection process. We conduct comprehensive experiments on Twitter and amazon.com which demonstrate the efficacy of our proposed algorithms.


Data Mining a Trillion Time Series Subsequences Under Dynamic Time Warping

AAAI Conferences

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. We show that our ideas allow us to solve higher-level time series data mining problems at scales that would otherwise be untenable.


Emerging Applications for Intelligent Diabetes Management

AI Magazine

Diabetes management is a difficult task for patients, who must monitor and control their blood glucose levels in order to avoid serious diabetic complications. This paper describes three emerging applications that employ AI to ease this task: (1) case-based decision support for diabetes management; (2) machine learning classification of blood glucose plots; and (3) support vector regression for blood glucose prediction. The first application provides decision support by detecting blood glucose control problems and recommending therapeutic adjustments to correct them. The third aims to build a hypoglycemia predictor that could alert patients to dangerously low blood glucose levels in time to take preventive action.


Emerging Applications for Intelligent Diabetes Management

AI Magazine

Diabetes management is a difficult task for patients, who must monitor and control their blood glucose levels in order to avoid serious diabetic complications. It is a difficult task for physicians, who must manually interpret large volumes of blood glucose data to tailor therapy to the needs of each patient. This paper describes three emerging applications that employ AI to ease this task: (1) case-based decision support for diabetes management; (2) machine learning classification of blood glucose plots; and (3) support vector regression for blood glucose prediction. The first application provides decision support by detecting blood glucose control problems and recommending therapeutic adjustments to correct them. The second provides an automated screen for excessive glycemic variability. The third aims to build a hypoglycemia predictor that could alert patients to dangerously low blood glucose levels in time to take preventive action. All are products of the 4 Diabetes Support SystemTM project, which uses AI to promote the health and wellbeing of people with type 1 diabetes. These emerging applications could potentially benefit 20 million patients who are at risk for devastating complications, thereby improving quality of life and reducing health care cost expenditures.


Grouping Strokes into Shapes in Hand-Drawn Diagrams

AAAI Conferences

Objects in freely-drawn sketches often have no spatial or temporal separation, making object recognition difficult. We present a two-step stroke-grouping algorithm that first classifies individual strokes according to the type of object to which they belong, then groups strokes with like classifications into clusters representing individual objects. The first step facilitates clustering by naturally separating the strokes, and both steps fluidly integrate spatial and temporal information. Our approach to grouping is unique in its formulation as an efficient classification task rather than, for example, an expensive search task. Our single-stroke classifier performs at least as well as existing single-stroke classifiers on text vs. nontext classification, and we present the first three-way single-stroke classification results. Our stroke grouping results are the first reported of their kind; our grouping algorithm correctly groups between 86% and 91% of the ink in diagrams from two domains, with between 69% and 79% of shapes being perfectly clustered.


Combining Speech and Sketch to Interpret Unconstrained Descriptions of Mechanical Devices

AAAI Conferences

Mechanical design tools would be considerably more useful if we could interact with them in the way that human designers communicate design ideas to one another, i.e., using crude sketches and informal speech. Those crude sketches frequently contain pen strokes of two different sorts, one type portraying device structure, the other denoting gestures, such as arrows used to indicate motion. We report here on techniques we developed that use information from both sketch and speech to distinguish gesture strokes from non-gestures -- a critical first step in understanding a sketch of a device. We collected and analyzed unconstrained device descriptions, which revealed six common types of gestures. Guided by this knowledge, we developed a classifier that uses both sketch and speech features to distinguish gesture strokes from non-gestures. Experiments with our techniques indicate that the sketch and speech modalities alone produce equivalent classification accuracy, but combining them produces higher accuracy.