Lopes, Cristina
Managing Autonomous Mobility on Demand Systems for Better Passenger Experience
Shen, Wen, Lopes, Cristina
Autonomous mobility on demand systems, though still in their infancy, have very promising prospects in providing urban population with sustainable and safe personal mobility in the near future. While much research has been conducted on both autonomous vehicles and mobility on demand systems, to the best of our knowledge, this is the first work that shows how to manage autonomous mobility on demand systems for better passenger experience. We introduce the Expand and Target algorithm which can be easily integrated with three different scheduling strategies for dispatching autonomous vehicles. We implement an agent-based simulation platform and empirically evaluate the proposed approaches with the New York City taxi data. Experimental results demonstrate that the algorithm significantly improve passengers' experience by reducing the average passenger waiting time by up to 29.82% and increasing the trip success rate by up to 7.65%.
Mining Internet-Scale Software Repositories
Linstead, Erik, Rigor, Paul, Bajracharya, Sushil, Lopes, Cristina, Baldi, Pierre F.
Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop an infrastructure for the automated crawling, parsing, and database storage of open source software. The infrastructure allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, SLOC, and method call distributions. We then develop and apply unsupervised author-topic, probabilistic models to automatically discover the topics embedded in the code and extract topic-word and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC metric to 0.86-- roughly 10-30% better than previous approaches based on text alone.