Giles, C. Lee


Reports of the 2016 AAAI Workshop Program

AI Magazine

The Workshop Program of the Association for the Advancement of Artificial Intelligence's Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) was held at the beginning of the conference, February 12-13, 2016. Workshop participants met and discussed issues with a selected focus -- providing an informal setting for active exchange among researchers, developers and users on topics of current interest. To foster interaction and exchange of ideas, the workshops were kept small, with 25-65 participants. Attendance was sometimes limited to active participants only, but most workshops also allowed general registration by other interested individuals.


CiteSeerX: AI in a Digital Library Search Engine

AI Magazine

CiteSeerX is a digital library search engine providing access to more than five million scholarly documents with nearly a million users and millions of hits per day. These AI technologies have been developed by CiteSeerX group members over the past 5–6 years. We also present AI technologies implemented in table and algorithm search, which are special search modes in CiteSeerX. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.


Reports on the 2015 AAAI Workshop Program

AI Magazine

AAAI's 2015 Workshop Program was held Sunday and Monday, January 25–26, 2015 at the Hyatt Regency Austin Hotel in Austion, Texas, USA. The AAAI-15 workshop program included 15 workshops covering a wide range of topics in artificial intelligence. Most workshops were held on a single day. The titles of the workshops included AI and Ethics, AI for Cities, AI for Transportation: Advice, Interactivity and Actor Modeling, Algorithm Configuration, Artificial Intelligence Applied to Assistive Technologies and Smart Environments, Beyond the Turing Test, Computational Sustainability, Computer Poker and Imperfect Information, Incentive and Trust in E-Communities, Multiagent Interaction without Prior Coordination, Planning, Search, and Optimization, Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Trajectory-Based Behaviour Analytics, World Wide Web and Public Health Intelligence, Knowledge, Skill, and Behavior Transfer in Autonomous Robots, and Learning for General Competency in Video Games.


Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching

AAAI Conferences

Personal names are important and common information in many data sources, ranging from social networks and news articles to patient records and scientific documents.They are often used as queries for retrieving records and also as key information for linking documents from multiple sources. Matching personal names can be challenging due to variations in spelling and various formatting of names. While many approximated name matching techniques have been proposed, most are generic string-matching algorithms. Unlike other types of proper names, personal names are highly cultural. Many ethnicities have their own unique naming systems and identifiable characteristics. In this paper we explore such relationships between ethnicities and personal names to improve the name matching performance. First, we propose a name-ethnicity classifier based on the multinomial logistic regression. Our model can effectively identify name-ethnicity from personal names in Wikipedia, which we use to define name-ethnicity, to within 85\% accuracy.Next, we propose a novel alignment-based name matching algorithm, based on Smith–Waterman algorithm and logistic regression.Different name matching models are then trained for different name-ethnicity groups.Our preliminary experimental result on DBLP's disambiguated author dataset yields a performance of 99\% precision and 89\% recall.Surprisingly, textual features carry more weight than phonetic ones in name-ethnicity classification.


Table Header Detection and Classification

AAAI Conferences

In digital libraries, a table, as a specific document component as well as a condensed way to present structured and relational data, contains rich information and often the only source of .that information. In order to explore, retrieve, and reuse that data, tables should be identified and the data extracted. Table recognition is an old field of research. However, due to the diversity of table styles, the results are still far from satisfactory, and not a single algorithm performs well on all different types of tables. In this paper, we randomly take samples from the CiteSeerX to investigate diverse table styles for automatic table extraction. We find that table headers are one of the main characteristics of complex table styles. We identify a set of features that can be used to segregate headers from tabular data and build a classifier to detect table headers. Our empirical evaluation on PDF documents shows that using a Random Forest classifier achieves an accuracy of 92%.


Context Sensitive Topic Models for Author Influence in Document Networks

AAAI Conferences

In a document network such as a citation network of scientific documents, web-logs etc., the content produced by authors exhibit their interest in certain topics. In addition some authors influence other authors' interests. In this work, we propose to model the influence of cited authors along with the interests of citing authors. Morover , we hypothesize that citations present in documents, the context surrounding the citation mention provides extra topical information about the cited authors. However, associating terms in the context to the cited authors remains an open problem. We propose novel document generation schemes that incorporate the context while simultaneously modeling the interests of citing authors and influence of the cited authors. Our experiments show significant improvements over baseline models for various evaluation criteria such as link prediction between document and cited author, and quantitatively explaining unseen text.