Goto

Collaborating Authors

 Knoblock, Craig A.


Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

arXiv.org Artificial Intelligence

Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.


Guided Generative Models using Weak Supervision for Detecting Object Spatial Arrangement in Overhead Images

arXiv.org Artificial Intelligence

The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional supervised object detection approaches can estimate accurate spatial arrangement but require large amounts of bounding box annotations. Recent semi-supervised clustering approaches can reduce manual labeling but still require annotations for all object categories in the image. This paper presents the target-guided generative model (TGGM), under the Variational Auto-encoder (VAE) framework, which uses Gaussian Mixture Models (GMM) to estimate the distributions of both hidden and decoder variables in VAE. Modeling both hidden and decoder variables by GMM reduces the required manual annotations significantly for spatial arrangement estimation. Unlike existing approaches that the training process can only update the GMM as a whole in the optimization iterations (e.g., a "minibatch"), TGGM allows the update of individual GMM components separately in the same optimization iteration. Optimizing GMM components separately allows TGGM to exploit the semantic relationships in spatial data and requires only a few labels to initiate and guide the generative process. Our experiments shows that TGGM achieves results comparable to the state-of-the-art semi-supervised methods and outperforms unsupervised methods by 10% based on the $F_{1}$ scores, while requiring significantly fewer labeled data.


An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

arXiv.org Artificial Intelligence

Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.


Automatic Adaptation to Sensor Replacements

AAAI Conferences

Many software systems run on long-lifespan platforms that operate in diverse and dynamic environments. If these software systems could automatically adapt to hardware changes, it would significantly reduce the maintenance cost and enable rapid upgrade. In this paper, we study the problem of how to automatically adapt to sensor changes, as an important step towards building such long-lived, survivable software systems. We address the adaptation scenarios where a set of sensors are replaced by new sensors. Our approach reconstructs sensor values of replaced sensors by preserving distributions of sensor values before and after the sensor change, thereby not warranting a change in higher-layer software. Compared to existing work, our approach has the following advantages: a) exploiting new sensors without requiring an overlapping period of time between new sensors and old ones; b) providing an estimation of adaptation quality; c) scaling to a large number of sensors. Experiments on weather data and Unmanned Undersea Vehicle (UUV) data demonstrate that our approach can automatically adapt to sensor changes with higher accuracy compared to baseline methods.


Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing

AAAI Conferences

In this paper, we use the STN framework to study important classes of load scheduling problems that involve metric Efficient algorithms for temporal reasoning are critical for temporal constraints as well as costs of resources. Problems a large number of real-world applications, including autonomous that can be studied in this framework include those that arise space exploration (Knight et al. 2001), domestic in the smart home (Qayyum et al. 2015) and smart grid domains activity management, and job scheduling on servers (Ji, He, (Sianaki, Hussain, and Tabesh 2010) as well as in high and Cheng 2007). Many formalisms have been proposed performance computing (HPC) (Yang et al. 2013) and job and are currently used for reasoning with metric time and shop scheduling (Xiong, Sadeh, and Sycara 1992). Although resources (Smith and Cheng 1993; Kumar 2003; Muscettola the STN framework can be extended to reason about the resource 2004). Simple Temporal Networks (STNs) (Dechter, Meiri, requirements of events (Kumar 2003), in this paper, and Pearl 1991) are popularly used for efficiently reasoning for simplicity of exposition, we reason about the resource about difference constraints in scheduling problems.


An Iterative Approach to Synthesize Data Transformation Programs

AAAI Conferences

Programming-by-Example approaches allow users to transform data by simply entering the target data. However, current methods do not scale well to complicated examples, where there are many examples or the examples are long.In this paper, we present an approach that exploits the fact that users iteratively provide examples.It reuses the previous subprograms to improve the efficiency in generating new programs.We evaluated the approach with a variety of transformation scenarios.The results show that the approach significantly reduces the time used to generate the transformation programs, especially in complicated scenarios.


Exploiting Semantics for Big Data Integration

AI Magazine

There is a great deal of interest in big data, focusing mostly on data set size. The use of semantics in this integration descriptions and then integrating the data within process is key to building an approach that scales this unified framework. Finally, we conclude by to large numbers of heterogeneous sources. For example, in and (4) integrate the data across sources using this our museum use case, we received data in spreadsheets model. Karma has been used on a variety of types of (figure 1), comma-separated values (CSV), data, including biological data, mobile phone data, JSON (figure 3), XML, and relational databases (figure geospatial data, and cultural heritage data.


Semantics for Big Data Integration and Analysis

AAAI Conferences

Much of the focus on big data has been on the problem of processing very large sources.   There is an equally hard problem of how to normalize, integrate, and transform the data from many sources into the format required to run large-scale analysis and visualization tools.  We have previously developed an approach to semi-automatically mapping diverse sources into a shared domain ontology so that they can be quickly combined.  In this paper we describe our approach to building and executing integration and restructuring plans to support analysis and visualization tools on very large and diverse datasets.


Learning Transformation Rules by Examples

AAAI Conferences

However, this approach usually requires expert users to write individual transformations for each data source manually. Figure 2: Delete grammar A variety of work (Kandel et al. 2011; Raman and Hellerstein 2001; Liang, Jordan, and Klein 2010) tries to take advantage of user input to solve the transformation problem, cally learn transformation rules through examples. As shown but these methods either cannot learn rules from training in Figure 1, a user might want to reverse the order of the date data or need the training data to contain all the intermediate and use hyphens to replace slashes. The user would just provide steps. We have developed an approach where the user the system with an example "30/07/2010" and "2010-only needs to provide the target value as an example.


Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation

AAAI Conferences

Automatic semantic annotation of structured data enables unsupervised integration of data from heterogeneous sources but is difficult to perform accurately due to the presence of many numeric fields and proper-noun fields that do not allow reference-based approaches and the absence of natural language text that prevents the use of language-based approaches. In addition, several of these semantic types have multiple heterogeneous representations, while sharing syntactic structure with other types. In this work, we propose a new approach to use conditional random fields (CRFs) to perform semantic annotation of structured data that takes advantage of the structure and labels of the tokens for higher accuracy of field labeling, while still allowing the use of exact inference techniques. We compare our approach with a linear-CRF based model that only labels fields and also with a regular-expression based approach.