Collaborating Authors

Uber Open-Sources Fiber - A New Library For Distributed Machine Learning


Latest technologies such as machine learning and deep learning require a colossal amount of data to improve its outcomes' accuracy. However, it is nearly impossible for a local computer to process the vast amount of data. As a result, practitioners use distributed computing for obtaining high-computational power to deliver quick and accurate results. However, effectively managing distributed computation is not straightforward, and this causes hindrance in training and evaluating AI models. To address these challenges, Uber has open-sourced its Fiber framework to help researchers and developers streamline their large-scale parallel scientific computation.

New Research Shows How AI Can Act as Mediators


According to VentureBeat, AI researchers at Uber have recently posted a paper to Arxiv outlining a new platform intended to assist in the creation of distributed AI models. The platform is called Fiber, and it can be used to drive both reinforcement learning tasks and population-based learning. Fiber is designed to make large-scale parallel computation more accessible to non-experts, letting them take advantage of the power of distributed AI algorithms and models. Fiber has recently been made open-source on GitHub, and it's compatible with Python 3.6 or above, with Kubernetes running on a Linux system and running in a cloud environment. According to the team of researchers, the platform is capable of easily scaling up to hundreds or thousands of individual machines.

Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks

AAAI Conferences

Location-Based Social Networks (LBSN) present so far the most vivid realization of the convergence of the physical and virtual social planes. In this work we propose a novel approach on modeling human activity and geographical areas by means of place categories. We apply a spectral clustering algorithm on areas and users of two metropolitan cities on a dataset sourced from the most vibrant LBSN, Foursquare. Our methodology allows the identification of user communities that visit similar categories of places and the comparison of urban neighborhoods within and across cities. We demonstrate how semantic information attached to places could be plausibly used as a modeling interface for applications such as recommender systems and digital tourist guides.

Positive Unlabeled Learning for Time Series Classification

AAAI Conferences

In many real-world applications of the time series classification problem, not only could the negative training instances be missing, the number of positive instances available for learning may also be rather limited. This has motivated the development of new classification algorithms that can learn from a small set P of labeled seed positive instances augmented with a set U of unlabeled instances (i.e. PU learning algorithms). However, existing PU learning algorithms for time series classification have less than satisfactory performance as they are unable to identify the class boundary between positive and negative instances accurately. In this paper, we propose a novel PU learning algorithm LCLC (Learning from Common Local Clusters) for time series classification. LCLC is designed to effectively identify the ground truths’ positive and negative boundaries, resulting in more accurate classifiers than those constructed using existing methods. We have applied LCLC to classify time series data from different application domains; the experimental results demonstrate that LCLC outperforms existing methods significantly.

Statistical Inference for Cluster Trees

Neural Information Processing Systems

A cluster tree provides an intuitive summary of a density function that reveals essential structure about the high-density clusters. The true cluster tree is estimated from a finite sample from an unknown true density. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of different features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyzing their properties and assessing their suitability for our inference task. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree.