Goto

Collaborating Authors

 Government


DimmWitted: A Study of Main-Memory Statistical Analytics

arXiv.org Machine Learning

We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine. Statistical analytics systems differ from conventional SQL-analytics in the amount and types of memory incoherence they can tolerate. Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. We study this new tradeoff space, and discover there are tradeoffs between hardware and statistical efficiency. We argue that our tradeoff study may provide valuable information for designers of analytics engines: for each system we consider, our prototype engine can run at least one popular task at least 100x faster. We conduct our study across five architectures using popular models including SVMs, logistic regression, Gibbs sampling, and neural networks.


Sequential Decision Making in Computational Sustainability via Adaptive Submodularity

AI Magazine

Many problems in computational sustainability require making a sequence of decisions in complex, uncertain environments. In this article, we review the recently discovered notion of adaptive submodularity, an intuitive diminishing returns condition that generalizes the classical notion of submodular set functions to sequential decision problems. We illustrate this concept in several case studies of interest in computational sustainability: First, we demonstrate how it can be used to efficiently plan for resolving uncertainty in adaptive management scenarios. Secondly, we show how it applies to dynamic conservation planning for protecting endangered species, a case study carried out in collaboration with the US Geological Survey and the US Fish and Wildlife Service.


A History of AI Research and Development in Thailand: Three Periods, Three Directions

AI Magazine

Thailand, a country of 65 million people, has had an active AI community for almost three decades. Research on Thai language processing and expert systems was then concentrated on at the laboratory. King Mongkut's University of Technology Thonburi also set up its own AI center -- as a The guest editor for this column was loosely affiliated group. Yuen Poovarawan was the pioneer in computer language processing of the Thai language. It is the National Electronics and Computer Technology now expanded to the Center of Excellence, supported Center (NECTEC) put together research development by National Electronics and Computer plans in AIrelated fields, for example, natural Technology Center (NECTEC), and focuses on language processing, expert systems, and merging together two types of technology: knowledge intelligent image processing.


Reports on the 2013 AAAI Fall Symposium Series

AI Magazine

Rinke Hoekstra (VU University from transferring and adapting semantic web Amsterdam) presented linked open data tools technologies to the big data quest. Finally, in the Social to discover connections within established scientific Networks and Social Contagion symposium, a data sets. Louiqa Rashid (University of Maryland) community of researchers explored topics such as social presented work on similarity metrics linking together contagion, game theory, network modeling, network-based drugs, genes, and diseases. Kyle Ambert (Intel) presented inference, human data elicitation, and Finna, a text-mining system to identify passages web analytics. Highlights of the symposia are contained of interest containing descriptions of neuronal in this report.


Sequential Decision Making in Computational Sustainability via Adaptive Submodularity

AI Magazine

Many problems in computational sustainability require making a sequence of decisions in complex, uncertain environments. Such problems are generally notoriously difficult. In this article, we review the recently discovered notion of adaptive submodularity, an intuitive diminishing returns condition that generalizes the classical notion of submodular set functions to sequential decision problems. Problems exhibiting the adaptive submodularity property can be efficiently and provably near-optimally solved using simple myopic policies. We illustrate this concept in several case studies of interest in computational sustainability: First, we demonstrate how it can be used to efficiently plan for resolving uncertainty in adaptive management scenarios. Secondly, we show how it applies to dynamic conservation planning for protecting endangered species, a case study carried out in collaboration with the US Geological Survey and the US Fish and Wildlife Service.


The Diagnostic Competitions

AI Magazine

Therefore, diagnostic algorithms must reason backwards from symptoms to causes. For example, determining that a dead battery is the cause of your car not starting in the morning (and not the wiring or the ignition switch). The domains of diagnostic algorithms includes analog and digital circuits, software systems, thermal systems, biological systems, and physical mechanisms. The same classes of diagnostic algorithms can apply in all domains. Diagnostic algorithms make observations, often in real time, of a system being diagnosed.


Computational Sustainability

AI Magazine

Computational sustainability problems, which exist in dynamic environments with high amounts of uncertainty, provide a variety of unique challenges to artificial intelligence research and the opportunity for significant impact upon our collective future. This editorial provides an overview of artificial intelligence for computational sustainability, and introduces this special issue of AI Magazine.


A Multivariate Complexity Analysis of Lobbying in Multiple Referenda

Journal of Artificial Intelligence Research

Assume that each of n voters may or may not approve each of m issues. If an agent (the lobby) may influence up to k voters, then the central question of the NP-hard Lobbying problem is whether the lobby can choose the voters to be influenced so that as a result each issue gets a majority of approvals. This problem can be modeled as a simple matrix modification problem: Can one replace k rows of a binary n x m-matrix by k all-1 rows such that each column in the resulting matrix has a majority of 1s? Significantly extending on previous work that showed parameterized intractability (W[2]-completeness) with respect to the number k of modified rows, we study how natural parameters such as n, m, k, or the "maximum number of 1s missing for any column to have a majority of 1s" (referred to as "gap value g") govern the computational complexity of Lobbying. Among other results, we prove that Lobbying is fixed-parameter tractable for parameter m and provide a greedy logarithmic-factor approximation algorithm which solves Lobbying even optimally if m < 5. We also show empirically that this greedy algorithm performs well on general instances. As a further key result, we prove that Lobbying is LOGSNP-complete for constant values g>0, thus providing a first natural complete problem from voting for this complexity class of limited nondeterminism.


Generalized Canonical Correlation Analysis for Classification

arXiv.org Machine Learning

It is common to find collections/measurements of related objects, such as the same article in different languages, similar talks given by different presenters, similar weather patterns in different years, etc. It remains to determine how much the available big data helps us in statistical analysis; simply throwing every collected dataset into the mix may not yield an optimal output. Thus it is natural and important to understand theoretically when and how additional datasets improve the performance of various statistical analysis tasks such as regression, clustering, classification, etc. This is our motivation to explore the following classification problem.


Causality Networks

arXiv.org Machine Learning

Abstract--While correlation measures are used to discern statistical relationships between observed variables in almost all branches of datadriven scientific inquiry, what we are really interested in is the existence of causal dependence. Statistical tests for causality, it turns out, are significantly harder to construct; the difficulty stemming from both philosophical hurdles in making precise the notion of causality, and the practical issue of obtaining an operational procedure from a philosophically sound definition. In particular, designing an efficient causality test, that may be carried out in the absence of restrictive presuppositions on the underlying dynamical structure of the data at hand, is nontrivial. Nevertheless, ability to computationally infer statistical prima facie evidence of causal dependence may yield a far more discriminative tool for data analysis compared to the calculation of simple correlations. In the present work, we present a new nonparametric test of Granger causality for quantized or symbolic data streams generated by ergodic stationary sources. In contrast to state-of-art binary tests, our approach makes precise and computes the degree of causal dependence between data streams, without making any restrictive assumptions, linearity or otherwise. Additionally, without any a priori imposition of specific dynamical structure, we infer explicit generative models of causal crossdependence, which may be then used for prediction. These explicit models are represented as generalized probabilistic automata, referred to crossed automata, and are shown to be sufficient to capture a fairly general class of causal dependence. The theoretical results are applied to weekly search-frequency data from Google Trends API for a chosen set of socially "charged" keywords. The causality network inferred from this dataset reveals, quite expectedly, the causal importance of certain keywords. It is also illustrated that correlation analysis fails to gather such insight.