Plotting

 University of Michigan


Fine-Grained Car Detection for Visual Census Estimation

AAAI Conferences

Targeted socio-economic policies require an accurate understanding of a country’s demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning-driven approaches are cheaper and faster—with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. To facilitate our work, we have collected the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources, classified by car experts to account for even the most subtle of visual differences. We use this data to construct the largest scale fine-grained detection system reported to date. Our prediction results correlate well with ground truth income data (r=0.82), Massachusetts department of vehicle registration, and sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Finally, we learn interesting relationships between cars and neighborhoods allowing us to perform the first large scale sociological analysis of cities using computer vision techniques.


Spoofing the Limit Order Book: An Agent-Based Model

AAAI Conferences

We present an agent-based model of manipulating prices in financial markets through spoofing: submitting spurious orders to mislead other traders. Built around the standard limit-order mechanism, our model captures a complex market environment with combined private and common values, the latter represented by noisy observations of a fundamental time series. We start with zero intelligence traders, who ignore the order book, and introduce a version of heuristic belief learning (HBL) strategy that exploits the order book to predict price outcomes. By employing an empirical game-theoretic analysis to derive approximate strategic equilibria, we demonstrate the effectiveness of HBL and the usefulness of order book information in a range of non-spoofing environments. We further show that a market with HBL traders is spoofable, in that a spoofer can qualitatively manipulate prices towards its desired direction. After re-equilibrating games with spoofing, we find spoofing generally hurts market surplus and decreases the proportion of HBL. However, HBL's persistence in most environments with spoofing indicates a consistently spoofable market. Our model provides a way to quantify the effect of spoofing on trading behavior and efficiency, and thus measures the profitability and cost of an important form of market manipulation.


A Multiagent System Approach to Scheduling Devices in Smart Homes

AAAI Conferences

Demand-side management (DSM) in the smart grid allows customers to make autonomous decisions on their energy consumption, helping energy providers to reduce the peaks in load demand. The automated scheduling of smart devices in residential and commercial buildings plays a key role in DSM. Due to data privacy and user autonomy, such an approach is best implemented through distributed multi-agent systems. This paper makes the following contributions: (i) It introduces the Smart Home Device Scheduling (SHDS) problem, which formalizes the device scheduling and coordination problem across multiple smart homes as a multi-agent system; (ii) It describes a mapping of this problem to a distributed constraint optimization problem; (iii) It proposes a distributed algorithm for the SHDS problem; and (iv) It presents empirical results from a physically distributed system of Raspberry Pis, each capable of controlling smart devices through hardware interfaces.


Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies

AAAI Conferences

Privacy policies are commonly used to inform users about the data collection and use practices of websites, mobile apps, and other products and services. However, the average Internet user struggles to understand the contents of these documents and generally does not read them. Natural language and machine learning techniques offer the promise of automatically extracting relevant statements from privacy policies to help generate succinct summaries, but current techniques require large amounts of annotated data. The highest quality annotations require law experts, but their efforts do not scale efficiently. In this paper, we present results on bridging the gap between privacy practice categories defined by law experts with topics learned from Non-negative Matrix Factorization (NMF). To do this, we investigate the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary sets identified by topic modeling. The intersections exhibit strong matches between some categories and topics, although other categories have weaker affinities with topics. Our results show a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.


Automatic Extraction of Opt-Out Choices from Privacy Policies

AAAI Conferences

Online “notice and choice” is an essential concept in the US FTC’s Fair Information Practice Principles. Privacy laws based on these principles include requirements for providing notice about data practices and allowing individuals to exercise control over those practices. Internet users need control over privacy, but their options are hidden in long privacy policies which are cumbersome to read and understand. In this paper, we describe several approaches to automatically extract choice instances from privacy policy documents using natural language processing and machine learning techniques. We define a choice instance as a statement in a privacy policy that indicates the user has discretion over the collection, use, sharing, or retention of their data. We describe supervised machine learning approaches for automatically extracting instances containing opt-out hyperlinks and evaluate the proposed methods using the OPP-115 Corpus, a dataset of annotated privacy policies. Extracting information about privacy choices and controls enables the development of concise and usable interfaces to help Internet users better understand the choices offered by online services. The focus of this paper, however, is to describe such methods to automatically extract useful opt-out hyperlinks from privacy policies.


Remembering Marvin Minsky

AI Magazine

Marvin Minsky, one of the pioneers of artificial intelligence and a renowned mathematicial and computer scientist, died on Sunday, 24 January 2016 of a cerebral hemmorhage. In this article, AI scientists Kenneth D. Forbus (Northwestern University), Benjamin Kuipers (University of Michigan), and Henry Lieberman (Massachusetts Institute of Technology) recall their interactions with Minksy and briefly recount the impact he had on their lives and their research. A remembrance of Marvin Minsky was held at the AAAI Spring Symposium at Stanford University on March 22. Video remembrances of Minsky by Danny Bobrow, Benjamin Kuipers, Ray Kurzweil, Richard Waldinger, and others can be on the sentient webpage1 or on youtube.com.


Remembering Marvin Minsky

AI Magazine

Marvin Minsky, one of the pioneers of artificial intelligence and a renowned mathematicial and computer scientist, died on Sunday, 24 January 2016 of a cerebral hemmorhage. He was 88. In this article, AI scientists Kenneth D. Forbus (Northwestern University), Benjamin Kuipers (University of Michigan), and Henry Lieberman (Massachusetts Institute of Technology) recall their interactions with Minksy and briefly recount the impact he had on their lives and their research. A remembrance of Marvin Minsky was held at the AAAI Spring Symposium at Stanford University on March 22. Video remembrances of Minsky by Danny Bobrow, Benjamin Kuipers, Ray Kurzweil, Richard Waldinger, and others can be on the sentient webpage1 or on youtube.com.


"Is There Anything Else I Can Help You With?" Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent

AAAI Conferences

Intelligent conversational assistants, such as Apple's Siri, Microsoft's Cortana, and Amazon's Echo, have quickly become a part of our digital life. However, these assistants have major limitations, which prevents users from conversing with them as they would with human dialog partners. This limits our ability to observe how users really want to interact with the underlying system. To address this problem, we developed a crowd-powered conversational assistant, Chorus, and deployed it to see how users and workers would interact together when mediated by the system. Chorus sophisticatedly converses with end users over time by recruiting workers on demand, which in turn decide what might be the best response for each user sentence. Up to the first month of our deployment, 59 users have held conversations with Chorus during 320 conversational sessions. In this paper, we present an account of Chorus' deployment, with a focus on four challenges: (i) identifying when conversations are over, (ii) malicious users and workers, (iii) on-demand recruiting, and (iv) settings in which consensus is not enough. Our observations could assist the deployment of crowd-powered conversation systems and crowd-powered systems in general.


Measuring the Efficiency of Charitable Giving with Content Analysis and Crowdsourcing

AAAI Conferences

In the U.S., individuals give more than 200 billion dollars to over 50 thousand charities each year, yet how people make these choices is not well understood. In this study, we use data from CharityNavigator.org and web browsing data from Bing toolbar to understand charitable giving choices. Our main goal is to use data on charities' overhead expenses to better understand efficiency in the charity marketplace. A preliminary analysis indicates that the average donor is "wasting" more than 15% of their contribution by opting for poorly run organizations as opposed to higher rated charities in the same Charity Navigator categorical group. However, charities within these groups may not represent good substitutes for each other. We use text analysis to identify substitutes for charities based on their stated missions and validate these substitutes with crowd-sourced labels. Using these similarity scores, we simulate market outcomes using web browsing and revenue data. With more realistic similarity requirements, the estimated loss drops by 75%—much of what looked like inefficient giving can be explained by crowd-validated similarity requirements that are not fulfilled by most charities within the same category. A choice experiment helps us further investigate the extent to which a recommendation system could impact the market. The results indicate that money could be redirected away from the long-tail of inefficient organizations. If widely adopted, the savings would be in the billions of dollars, highlighting the role the web could have in shaping this important market.


Improving Predictive State Representations via Gradient Descent

AAAI Conferences

Predictive state representations (PSRs) model dynamical systems using appropriately chosen predictions about future observations as a representation of the current state. In contrast to the hidden states posited by HMMs or RNNs, PSR states are directly observable in the training data; this gives rise to a moment-matching spectral algorithm for learning PSRs that is computationally efficient and statistically consistent when the model complexity matches that of the true system generating the data. In practice, however, model mismatch is inevitable and while spectral learning remains appealingly fast and simple it may fail to find optimal models. To address this problem, we investigate the use of gradient methods for improving spectrally-learned PSRs. We show that only a small amount of additional gradient optimization can lead to significant performance gains, and moreover that initializing gradient methods with the spectral learning solution yields better models in significantly less time than starting from scratch.