Industry
Learning a Kernel for Multi-Task Clustering
Gu, Quanquan (University of Illinois at Urbana-Champaign) | Li, Zhenhui (University of Illinois at Urbana-Champaign) | Han, Jiawei (University of Illinois at Urbana-Champaign)
Multi-task learning has received increasing attention in the past decade. Many supervised multi-task learning methods have been proposed, while unsupervised multi-task learning is still a rarely studied problem. In this paper, we propose to learn a kernel for multi-task clustering. Our goal is to learn a Reproducing Kernel Hilbert Space, in which the geometric structure of the data in each task is preserved, while the data distributions of any two tasks are as close as possible. This is formulated as a unified kernel learning framework, under which we study two types of kernel learning: nonparametric kernel learning and spectral kernel design. Both types of kernel learning can be solved by linear programming. Experiments on several cross-domain text data sets demonstrate that kernel k-means on the learned kernel can achieve better clustering results than traditional single-task clustering methods. It also outperforms the newly proposed multi-task clustering method.
Accelerating the Discovery of Data Quality Rules: A Case Study
Yeh, Peter Z. (Accenture) | Puri, Colin A. (Accenture) | Wagman, Mark (Accenture) | Easo, Ajay K (Accenture)
Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an application -- Data Quality Rules Accelerator (DQRA) -- that accelerates Data Quality (DQ) efforts (e.g. data profiling and cleansing) by automatically discovering DQ rules for detecting inconsistencies in data. We then present two evaluations. The first evaluation compares DQRA to existing solutions; and shows that DQRA either outperformed or achieved performance comparable with these solutions on metrics such as precision, recall, and runtime. The second evaluation is a case study where DQRA was piloted at a large utilities company to improve data quality as part of a legacy migration effort. DQRA was able to discover rules that detected data inconsistencies directly impacting revenue and operational efficiency. Moreover, DQRA was able to significantly reduce the amount of effort required to develop these rules compared to the state of the practice. Finally, we describe ongoing efforts to deploy DQRA.
Modeling Player Retention in Madden NFL 11
Weber, Ben George (University of California, Santa Cruz) | John, Michael (Electronic Arts, Inc.) | Mateas, Michael (University of California, Santa Cruz) | Jhala, Arnav (University of California, Santa Cruz)
Video games are increasingly producing huge datasets available for analysis resulting from players engaging in interactive environments. These datasets enable investigation of individual player behavior at a massive scale, which can lead to reduced production costs and improved player retention. We present an approach for modeling player retention in Madden NFL 11, a commercial football game. Our approach encodes gameplay patterns of specific players as feature vectors and models player retention as a regression problem. By building an accurate model of player retention, we are able to identify which gameplay elements are most influential in maintaining active players. The outcome of our tool is recommendations which will be used to influence the design of future titles in the Madden NFL series.
Abductive Inference for Combat: Using SCARE-S2 to Find High-Value Targets in Afghanistan
Shakarian, Paulo (U.S. Army) | Nagel, Mago (University of Maryland) | Schuetzle, Brittany (University of Maryland) | Subrahmanian, V.S. (University of Maryland)
Recently, geospatial abduction was introduced by the authors in [Shakarian et. al. 2010] as a way to infer unobserved geographic phenomena from a set of known observations and constraints between the two. In this paper, we introduce the SCARE-S2 software tool which applies geospatial abduction to the environment of Afghanistan. Unlike previous work, where we looked for small weapon caches supporting local attacks, here we look for insurgent high-value targets (HVT's), supporting insurgent operations in two provinces. These HVT's include the locations of insurgent leaders and major supply depots. Applying this method of inference to Afghanistan introduces several practical issues not addressed in previous work. Namely, we are conducting inference in a much larger area (24,940 sq km as compared to 675 sq km in previous work), on more varied terrain, and must consider the influence of many local tribes. We address all of these problems and evaluate our software on 6 months of real-world counter-insurgency data. We show that we are able to abduce regions of a relatively small area (on average, under 100 sq km and each containing, on average, 4.8 villages) that are more dense with HVT's (35 X more than the overall area considered).
Monitoring Entities in an Uncertain World: Entity Resolution and Referential Integrity
Minton, Steven N. (InferLink Corporation) | Macskassy, Sofus A. (Fetch Technologies) | LaMonica, Peter (Air Force Research Laboratory) | See, Kane (Fetch Technologies) | Knoblock, Craig A. (University of Southern California) | Barish, Greg (Fetch Technologies) | Michelson, Matthew (Fetch Technologies) | Liuzzi, Raymond (Raymond Technologies)
This paper describes a system to help intelligence analysts track and analyze information being published in multiple sources, particularly open sources on the Web. The system integrates technology for Web harvesting, natural language extraction, and network analytics, and allows analysts to view and explore the results via a Web application. One of the difficult problems we address is the entity resolution problem, which occurs when there are multiple, differing ways to refer to the same entity. The problem is particularly complex when noisy data is being aggregated over time, there is no clean master list of entities, and the entities under investigation are intentionally being deceptive. Our system must not only perform entity resolution with noisy data, but must also gracefully recover when entity resolution mistakes are subsequently corrected. We present a case study in arms trafficking that illustrates the issues, and describe how they are addressed.
Designing Resilient Long-Reach Passive Optical Networks
Mehta, Deepak (University College Cork) | O’Sullivan, Barry (University College Cork) | Quesada, Luis (University College Cork) | Ruffini, Marco (University of Dublin) | Payne, David (University of Dublin) | Doyle, Linda (University of Dublin)
We report on an emerging application focused on the design of resilient long reach passive optical networks using combinatorial optimisation techniques. The objective of the application is to determine the optimal position and capacity of a set of metro nodes. We specifically consider dual parented networks whereby each customer must be associated with two metro nodes. An important property of such a placement is resilience to single node failure. Therefore excess capacity should be provided at each metro node in order to ensure that customers can be redistributed amongst the metro sites. Our application, as well as finding optimal node placements, can compute the minimum level of excess capacity on all metro nodes. In this paper we present three alternative approaches to optimal metro node placement.We present a detailed analysisof the impact of different placement approaches on the distribution of excess capacity throughout the network. We show that preferential distributions occur in practice, based on a case-study in Ireland. Finally we show that load and excess capacity provision are independent of each other.
Emerging Applications for Intelligent Diabetes Management
Marling, Cindy (Ohio University) | Wiley, Matthew (Ohio University ) | Bunescu, Razvan (Ohio University ) | Shubrook, Jay (Ohio University) | Schwartz, Frank (Ohio University)
Diabetes management is a difficult task for patients, who must monitor and control their blood glucose levels in order to avoid serious diabetic complications. It is a difficult task for physicians, who must manually interpret large volumes of blood glucose data to tailor therapy to the needs of each patient. This paper describes three emerging applications that employ AI to ease this task and shares difficulties encountered in transitioning AI technology from university researchers to patients and physicians.
Detecting Falls with Location Sensors and Accelerometers
Luštrek, Mitja (Jožef Stefan Institute) | Gjoreski, Hristijan (Jožef Stefan Institute) | Kozina, Simon (Jožef Stefan Institute) | Cvetković, Božidara (Jožef Stefan Institute) | Mirchevska, Violeta (Result d. o. o.) | Gams, Matjaž (Jožef Stefan Institute)
Due to the rapid aging of the population, many technical solutions for the care of the elderly are being developed, often involving fall detection with accelerometers. We present a novel approach to fall detection with location sensors. In our application, a user wears up to four tags on the body whose locations are detected with radio sensors. This makes it possible to recognize the user’s activity, including falling any lying afterwards, and the context in terms of the location in the apartment. We compared fall detection using location sensors, accelerometers and accelerometers combined with the context. A scenario consisting of events difficult to recognize as falls or non-falls was used for the comparison. The accuracy of the methods that utilized the context was almost 40 percentage points higher compared to the methods without the context. The accuracy of pure location-based methods was around 10 percentage points higher than the accuracy of accelerometers combined with the context.
Hybrid Qualitative Simulation of Military Operations
Hinrichs, Thomas (Northwestern University) | Forbus, Kenneth (Northwestern University) | Kleer, Johan de (PARC) | Yoon, Sungwook (PARC) | Jones, Eric (BAE Systems AIT) | Hyland, Robert (BAE Systems AIT) | Wilson, Jason (BAE Systems AIT)
Our goal is to enable military planners to rapidly critique alternative battle plans by simulating multiple outcomes of adversarial plans. We describe a novel simulator, SimPath, that combines qualitative reasoning, a geographic information system (GIS), and targeted probabilistic calculations to envision how adversarial battle plans can play out. We outline the problem and describe the overall operation of the simulator. We then explain how qualitative process theory is extended with actions to model military tasks, how envisioning is factored to reduce combinatorial explosion, and how probabilities are computed for transitions and used to filter possibilities. Empirical results, including an experiment conducted by an independent evaluator, are summarized. The results show that it is possible to identify dozens of possible outcomes on each of 9 combinations of adversarial plans (COAs) in under two minutes. We close with a discussion of future work.