Massachusetts Institute of Technology
A Large-Scale Study on Predicting and Contextualizing Building Energy Usage
Kolter, J. Zico (Massachusetts Institute of Technology) | Ferreira, Joseph (Massachusetts Institute of Technology)
In this paper we present a data-driven approach to modeling end user energy consumption in residential and commercial buildings. Our model is based upon a data set of monthly electricity and gas bills, collected by a utility over the course of several years, for approximately 6,500 buildings in Cambridge, MA. In addition, we use publicly available tax assessor records and geographical survey information to determine corresponding features for the buildings. Using both parametric and non-parametric learning methods, we learn models that predict distributions over energy usage based upon these features, and use these models to develop two end-user systems. For utilities or authorized institutions (those who may obtain access to the full data) we provide a system that visualizes energy consumption for each building in the city; this allows companies to quickly identify outliers (buildings which use much more energy than expected even after conditioning on the relevant predictors), for instance allowing them to target homes for potential retrofits or tiered pricing schemes. For other end users, we provide an interface for entering their own electricity and gas usage, along with basic information about their home, to determine how their consumption compares to that of similar buildings as predicted by our model. Merely allowing users to contextualize their consumption in this way, relating it to the consumption in similar buildings, can itself produce behavior changes to significantly reduce consumption.
Value Function Approximation in Reinforcement Learning Using the Fourier Basis
Konidaris, George (Massachusetts Institute of Technology) | Osentoski, Sarah (Brown University) | Thomas, Philip (University of Massachusetts Amherst)
We describe the Fourier basis, a linear value function approximation scheme based on the Fourier series. We empirically demonstrate that it performs well compared to radial basis functions and the polynomial basis, the two most popular fixed bases for linear value function approximation, and is competitive with learned proto-value functions.
Hybrid Planning with Temporally Extended Goals for Sustainable Ocean Observing
Li, Hui (The Boeing Company) | Williams, Brian (Massachusetts Institute of Technology)
A challenge to modeling and monitoring the health of the ocean environment is that it is largely under sensed and difficult to sense remotely. Autonomous underwater vehicles (AUVs) can improve observability, for example of algal bloom regions, ocean acidification, and ocean circulation. This AUV paradigm, however, requires robust operation that is cost effective and responsive to the environment. To achieve low cost we generate operational sequences automatically from science goals, and achieve robustness by reasoning about the discrete and continuous effects of actions. We introduce Kongming2, a generative planner for hybrid systems with temporally extended goals (TEGs) and temporally flexible actions. It takes as input high level goals and outputs trajectories and actions of the hybrid system, for example an AUV. Kongming2 makes two major extensions to Kongming1: planning for TEGs, and planning with temporally flexible actions. We demonstrated a proof of concept of the planner in the Atlantic ocean on Odyssey IV, an AUV designed and built by the MIT AUV Lab at Sea Grant.
Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation
Tellex, Stefanie (Massachusetts Institute of Technology) | Kollar, Thomas (Massachusetts Institute of Technology) | Dickerson, Steven (Massachusetts Institute of Technology) | Walter, Matthew R. (Massachusetts Institute of Technology) | Banerjee, Ashis Gopal (Massachusetts Institute of Technology) | Teller, Seth (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given the environment and the command. In contrast, our framework, called Generalized Grounding Graphs, dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure. Our system performs inference in the model to successfully find and execute plans corresponding to natural language commands such as "Put the tire pallet on the truck." The model is trained using a corpus of commands collected using crowdsourcing. We pair each command with robot actions and use the corpus to learn the parameters of the model. We evaluate the robot's performance by inferring plans from natural language commands, executing each plan in a realistic robot simulator, and asking users to evaluate the system's performance. We demonstrate that our system can successfully follow many natural language commands from the corpus.
GlobalIdentifier: Unexpected Personal Social Content with Data on the Web
Paradesi, Sharon (Massachusetts Institute of Technology) | Shih, Fuming (Massachusetts Institute of Technology)
The past year has seen a growing public awareness of the privacy risks of social networking through personal information that people voluntarily disclose. A spotlight has accordingly been turned on the disclosure policies of social networking sites and on mechanisms for restricting access to personal information on Facebook and other sites. But this is not sufficient to address privacy concerns in a world where Web-based data mining tools can let anyone infer information about others by combining data from multiple sources. To illustrate this, we are building a demonstration data miner, GlobalInferencer, that makes inferences about an individual?s lifestyle and other behavior. GlobalInferencer uses linked data technology to perform unified searches across Facebook, Flickr, and public data sites. It demonstrates that controlling access to personal information on individual social networking sites is not an adequate framework for protecting privacy, or even for supporting valid inferencing. In addition to access restrictions, there must be mechanisms for maintaining the provenance of information combined from multiple sources, for revealing the context within which information is presented, and for respecting the accountability that determines how information should be used.
Modeling the Detection of Textual Cyberbullying
Dinakar, Karthik (Massachusetts Institute of Technology) | Reichart, Roi (Hebrew University of Jerusalem) | Lieberman, Henry (Massachusetts Institute of Technology)
The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and the lack of meaningful supervision in the electronic medium are two factors that have exacerbated this social menace. Comments or posts involving sensitive topics that are personal to an individual are more likely to be internalized by a victim, often resulting in tragic outcomes. We decompose the overall detection problem into detection of sensitive topics, lending itself into text classification sub-problems. We experiment with a corpus of 4500 YouTube comments, applying a range of binary and multiclass classifiers. We find that binary classifiers for individual labels outperform multiclass classifiers. Our findings show that the detection of textual cyberbullying can be tackled by building individual topic-sensitive classifiers.
Sensing Urban Social Geography Using Online Social Networking Data
Phithakkitnukoon, Santi (Massachusetts Institute of Technology)
Growing pool of public-generated bits like online social networking data provides possibility to sense social dynamics in the urban space. In this position paper, we use a location-based online social networking data to sense geo-social activity and analyze the underlying social activity distribution of three different cities: London, Paris, and New York. We find a non-linear distribution of social activity, which follows the Power Law decay function. We perform inter-urban analysis based on social activity distribution and clustering. We believe that our study sheds new light on context-aware urban computing and social sensing.
Comparing Matrix Decomposition Methods for Meta-Analysis and Reconstruction of Cognitive Neuroscience Results
Gold, Kevin (Rochester Institute of Technology) | Havasi, Catherine (Massachusetts Institute of Technology) | Anderson, Michael (Franklin and Marshall College) | Arnold, Kenneth (Massachusetts Institute of Technology)
The results of 2,256 neuroimaging experiments were an- alyzed using singular value decomposition (SVD) and non-negative matrix factorization (NMF) to extract pat- terns in the data. To evaluate the techniquesโ efficacy at capturing regularities in the data, one positive and one negative result from each of 100 random experi- ments were treated as missing, and the values were it- eratively reconstructed using each technique for dimen- sionality reduction. Under the best conditions, preci- sion and recall of roughly 78% was achieved for each method. Weighting the domain matrix and area matrix to have equal first eigenvalues before combining them, a technique known as blending, significantly improved re- sults for both methods. While using unnormalized data appeared to produce a peak in results for 10-15 dimen- sions, normalizing to take into account variation in the popularity of experiment types removed the effect. The basis vectors produced by each method do not support the idea that current cognitive ontologies map well to individual brain areas.
Geotagging Tweets Using Their Content
Paradesi, Sharon Myrtle (Massachusetts Institute of Technology)
Harnessing rich, but unstructured information on social networks in real-time and showing it to relevant audience based on its geographic location is a major challenge. The system developed, TwitterTagger, geotags tweets and shows them to users based on their current physical location. Experimental validation shows a performance improvement of three orders by TwitterTagger compared to that of the baseline model.
Invited Talk Abstracts
Landauer, Thomas K. (Pearson Knowledge Technologies) | Picard, Rosalind W. (Massachusetts Institute of Technology) | Touretzky, David S. (Carnegie Mellon University) | Baker, Ryan (Worcester Ploytechnic Institute) | Holte, Robert C. (University of Alberta) | Stent, Amanda J. (AT&T Labs - Research) | Vanderveken, Daniel (University of Quebec)
Thomas K. Landauer (Pearson Knowledge Technologies) The recently created word maturity (WM) metric uses the computational language model LSA to mimic the average evolutionary growth of individual word and paragraph knowledge as a function of the total amount and order of simulated reading. The simulator traces the separate growth trajectories of an unlimited number of different words from the beginning of reading to adult level.