Goto

Collaborating Authors

 Giles, C. Lee


Reports of the 2016 AAAI Workshop Program

AI Magazine

The Workshop Program of the Association for the Advancement of Artificial Intelligence’s Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) was held at the beginning of the conference, February 12-13, 2016. Workshop participants met and discussed issues with a selected focus — providing an informal setting for active exchange among researchers, developers and users on topics of current interest. To foster interaction and exchange of ideas, the workshops were kept small, with 25-65 participants. Attendance was sometimes limited to active participants only, but most workshops also allowed general registration by other interested individuals. The AAAI-16 Workshops were an excellent forum for exploring emerging approaches and task areas, for bridging the gaps between AI and other fields or between subfields of AI, for elucidating the results of exploratory research, or for critiquing existing approaches. The fifteen workshops held at AAAI-16 were Artificial Intelligence Applied to Assistive Technologies and Smart Environments (WS-16-01), AI, Ethics, and Society (WS-16-02), Artificial Intelligence for Cyber Security (WS-16-03), Artificial Intelligence for Smart Grids and Smart Buildings (WS-16-04), Beyond NP (WS-16-05), Computer Poker and Imperfect Information Games (WS-16-06), Declarative Learning Based Programming (WS-16-07), Expanding the Boundaries of Health Informatics Using AI (WS-16-08), Incentives and Trust in Electronic Communities (WS-16-09), Knowledge Extraction from Text (WS-16-10), Multiagent Interaction without Prior Coordination (WS-16-11), Planning for Hybrid Systems (WS-16-12), Scholarly Big Data: AI Perspectives, Challenges, and Ideas (WS-16-13), Symbiotic Cognitive Systems (WS-16-14), and World Wide Web and Population Health Intelligence (WS-16-15).


Exploring Multiple Feature Spaces for Novel Entity Discovery

AAAI Conferences

Continuously discovering novel entities in news and Web data is important for Knowledge Base (KB) maintenance. One of the key challenges is to decide whether an entity mention refers to an in-KB or out-of-KB entity. We propose a principled approach that learns a novel entity classifier by modeling mention and entity representation into multiple feature spaces, including contextual, topical, lexical, neural embedding and query spaces. Different from most previous studies that address novel entity discovery as a submodule of entity linking systems, our model is more a generalized approach and can be applied as a pre-filtering step of novel entities for any entity linking systems. Experiments on three real-world datasets show that our method significantly outperforms existing methods on identifying novel entities.



Automatic Summary Generation for Scientific Data Charts

AAAI Conferences

Scientific charts in the web, whether as images or embedded in digital documents, contain valuable information that is not fully available to information retrieval tools. The information used to describe these charts is typically extracted from the image metadata rather than the information the graphic was initially designed to express. The problem of understanding digital charts found in scholarly documents, and inferring useful textual information from their graphical components is the focus of this study. We present an approach to automatically read the chart data, specifically bar charts, and provide the user with a textual summary of the chart. The proposed method follows a knowledge discovery approach that relies on a versatile graph representation of the chart. This representation is derived from analyzing a chart's original data values, from which useful features are extracted. The data features are in turn used to construct a semantic-graph. To generate a summary, the semantic-graph of the chart is mapped to appropriately crafted protoforms, which are constructs based on fuzzy logic. We verify the effectiveness of our framework by conducting experiments on bar charts extracted from over 1,000 PDF documents. Our preliminary results show that, under certain assumptions, 83% of the produced summaries provide plausible descriptions of the bar charts.


Document Type Classification in Online Digital Libraries

AAAI Conferences

Online digital libraries make it easier for researchers to search for scientific information. They have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data highly depends on the accuracy of classifiers that identify the types of documents that are crawled from the Web, e.g., as research papers, slides, books, etc., for appropriate indexing. These classifiers in turn depend on the choice of the feature representation. We propose novel features that result in high-accuracy classifiers for document type classification. Experimental results on several datasets show that our classifiers outperform models that are employed in current systems.


CiteSeerX: AI in a Digital Library Search Engine

AI Magazine

CiteSeerX is a digital library search engine providing access to more than five million scholarly documents with nearly a million users and millions of hits per day. These AI technologies have been developed by CiteSeerX group members over the past 5–6 years. We also present AI technologies implemented in table and algorithm search, which are special search modes in CiteSeerX. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.


CiteSeerX: AI in a Digital Library Search Engine

AI Magazine

Since then, the project has been directed by C. Lee Giles. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines. This is different from arXiv, Harvard ADS, and machine cluster to a private cloud using virtualization PubMed, where papers are submitted by authors or techniques (Wu et al. 2014). CiteSeerX extensively pushed by publishers. Unlike Google Scholar and leverages open source software, which significantly Microsoft Academic Search, where a significant portion reduces development effort. Red Hat of documents have only metadata (such as titles, Enterprise Linux (RHEL) 5 and 6 are the operating authors, and abstracts) available, users have full-text systems for all servers. Tomcat 7 is CiteSeerX keeps its own repository, which used for web service deployment on web and indexing serves cached versions of papers even if their previous servers. MySQL is used as the database management links are not alive any more. In additional to system to store metadata. Apache Solr is used paper downloads, CiteSeerX provides automatically for the index, and the Spring framework is used in extracted metadata and citation context, which the web application. In this section, we highlight four AI solutions that are Document metadata download service is not available leveraged by CiteSeerX and that tackle different challenges from Google Scholar and only recently available in metadata extraction and ingestion modules from Microsoft Academic Search. Finally, CiteSeerX (tagged by C, E, D, and A in figure 1).


Reports on the 2015 AAAI Workshop Program

AI Magazine

AAAI's 2015 Workshop Program was held Sunday and Monday, January 25–26, 2015 at the Hyatt Regency Austin Hotel in Austion, Texas, USA. The AAAI-15 workshop program included 15 workshops covering a wide range of topics in artificial intelligence. Most workshops were held on a single day. The titles of the workshops included AI and Ethics, AI for Cities, AI for Transportation: Advice, Interactivity and Actor Modeling, Algorithm Configuration, Artificial Intelligence Applied to Assistive Technologies and Smart Environments, Beyond the Turing Test, Computational Sustainability, Computer Poker and Imperfect Information, Incentive and Trust in E-Communities, Multiagent Interaction without Prior Coordination, Planning, Search, and Optimization, Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Trajectory-Based Behaviour Analytics, World Wide Web and Public Health Intelligence, Knowledge, Skill, and Behavior Transfer in Autonomous Robots, and Learning for General Competency in Video Games.


Reports on the 2015 AAAI Workshop Program

AI Magazine

AAAI's 2015 Workshop Program was held Sunday and Monday, January 25–26, 2015 at the Hyatt Regency Austin Hotel in Austion, Texas, USA. The AAAI-15 workshop program included 15 workshops covering a wide range of topics in artificial intelligence. Most workshops were held on a single day. The titles of the workshops included AI and Ethics, AI for Cities, AI for Transportation: Advice, Interactivity and Actor Modeling, Algorithm Configuration, Artificial Intelligence Applied to Assistive Technologies and Smart Environments, Beyond the Turing Test, Computational Sustainability, Computer Poker and Imperfect Information, Incentive and Trust in E-Communities, Multiagent Interaction without Prior Coordination, Planning, Search, and Optimization, Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Trajectory-Based Behaviour Analytics, World Wide Web and Public Health Intelligence, Knowledge, Skill, and Behavior Transfer in Autonomous Robots, and Learning for General Competency in Video Games.


Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia

AAAI Conferences

Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The "sense-specific'' prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluations on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.