Sybrandt, Justin
SmartChoices: Augmenting Software with Learned Implementations
Golovin, Daniel, Bartok, Gabor, Chen, Eric, Donahue, Emily, Huang, Tzu-Kuo, Kokiopoulou, Efi, Qin, Ruoyan, Sarda, Nikhil, Sybrandt, Justin, Tjeng, Vincent
We are living in a golden age of machine learning. Powerful models perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying these models in existing software systems remains challenging. In this paper, we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We highlight key design decisions and present case studies applying SmartChoices within a range of large-scale industrial systems.
Literature-based Discovery for Landscape Planning
Marasco, David, Tyagin, Ilya, Sybrandt, Justin, Spencer, James H., Safro, Ilya
This project demonstrates how medical corpus hypothesis generation, a knowledge discovery field of AI, can be used to derive new research angles for landscape and urban planners. The hypothesis generation approach herein consists of a combination of deep learning with topic modeling, a probabilistic approach to natural language analysis that scans aggregated research databases for words that can be grouped together based on their subject matter commonalities; the word groups accordingly form topics that can provide implicit connections between two general research terms. The hypothesis generation system AGATHA was used to identify likely conceptual relationships between emerging infectious diseases (EIDs) and deforestation, with the objective of providing landscape planners guidelines for productive research directions to help them formulate research hypotheses centered on deforestation and EIDs that will contribute to the broader health field that asserts causal roles of landscape-level issues. This research also serves as a partial proof-of-concept for the application of medical database hypothesis generation to medicine-adjacent hypothesis discovery. Keywords deforestation, emerging infectious disease, hypothesis generation, landscape planning, topic modeling Funding This research was funded by National Science Foundation grant numbers 1633608 and 2027864. The authors express their gratitude to the NSF for its generous support of their research. Introduction The recent COVID-19 crisis has put the issue of emerging infectious diseases (EIDs) back in the global spotlight. Addressing EIDs going forward will require widespread interdisciplinary cooperation, as discouraging them is a multifaceted and omnipresent endeavor. Biologists and health experts have frequently asserted that landscape-level issues drive EIDs (e.g.
FOBE and HOBE: First- and High-Order Bipartite Embeddings
Sybrandt, Justin, Safro, Ilya
Typical graph embeddings may not capture type-specific bipartite graph features that arise in such areas as recommender systems, data visualization, and drug discovery. Machine learning methods utilized in these applications would be better served with specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose edges into sets of indirect relationships between node neighborhoods. When sampling higher-order relationships, we reinforce similarities through algebraic distance on graphs. We also introduce ensemble embeddings to combine both into a "best of both worlds" embedding. The proposed methods are evaluated on link prediction and recommendation tasks and compared with other state-of-the-art embeddings. Our embeddings are found to perform better on recommendation tasks and equally competitive in link prediction. While being all highly beneficial in applications, we demonstrate that none of the existing state-of-the-art or our embeddings is clearly superior (in contrast to what is claimed in many papers), and discuss the trade offs present among them.