Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman-Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.
End-to-end learning has recently emerged as a promising technique to tackle the problem of autonomous driving. Existing works show that learning a navigation policy from raw sensor data may reduce the system's reliance on external sensing systems, (e.g. GPS), and/or outperform traditional methods based on state estimation and planning. However, existing end-to-end methods generally trade off performance for safety, hindering their diffusion to real-life applications. For example, when confronted with an input which is radically different from the training data, end-to-end autonomous driving systems are likely to fail, compromising the safety of the vehicle. To detect such failure cases, this work proposes a general framework for uncertainty estimation which enables a policy trained end-to-end to predict not only action commands, but also a confidence about its own predictions. In contrast to previous works, our framework can be applied to any existing neural network and task, without the need to change the network's architecture or loss, or to train the network. In order to do so, we generate confidence levels by forward propagation of input and model uncertainties using Bayesian inference. We test our framework on the task of steering angle regression for an autonomous car, and compare our approach to existing methods with both qualitative and quantitative results on a real dataset. Finally, we show an interesting by-product of our framework: robustness against adversarial attacks.
Variational inference for Bayesian deep neural networks (DNNs) requires specifying priors and approximate posterior distributions for neural network weights. Specifying meaningful weight priors is a challenging problem, particularly for scaling variational inference to deeper architectures involving high dimensional weight space. We propose Bayesian MOdel Priors Extracted from Deterministic DNN (MOPED) method for stochastic variational inference to choose meaningful prior distributions over weight space using deterministic weights derived from the pretrained DNNs of equivalent architecture. We evaluate the proposed approach on multiple datasets and real-world application domains with a range of varying complex model architectures to demonstrate MOPED enables scalable variational inference for Bayesian DNNs. The proposed method achieves faster training convergence and provides reliable uncertainty quantification, without compromising on the accuracy provided by the deterministic DNNs. We also propose hybrid architectures to Bayesian DNNs where deterministic and variational layers are combined to balance computation complexity during prediction phase and while providing benefits of Bayesian inference. We will release the source code for this work.
Learning the joint dependence of discrete variables is a fundamental problem in machine learning, with many applications including prediction, clustering and dimensionality reduction. More recently, the framework of copula modeling has gained popularity due to its modular parametrization of joint distributions. Among other properties, copulas provide a recipe for combining flexible models for univariate marginal distributions with parametric families suitable for potentially high dimensional dependence structures. More radically, the extended rank likelihood approach of Hoff (2007) bypasses learning marginal models completely when such information is ancillary to the learning task at hand as in, e.g., standard dimensionality reduction problems or copula parameter estimation. The main idea is to represent data by their observable rank statistics, ignoring any other information from the marginals. Inference is typically done in a Bayesian framework with Gaussian copulas, and it is complicated by the fact this implies sampling within a space where the number of constraints increase quadratically with the number of data points. The result is slow mixing when using off-the-shelf Gibbs sampling. We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size.
It seems like machine learning and artificial intelligence are topics at the top of everyone's mind in tech. Be it autonomous cars, robots, or machine intelligence in general, everyone's talking about machines getting smarter and being able to do more. At the same time, for many developers, machine learning and artificial intelligence are nebulous terms representing complex mathematical and data problems they just don't have the time to explore and learn. As I've spoken with lots of developers and CTOs about Fuzzy.io and our mission to make it easy for developers to start bringing intelligent decision-making to their software without needing huge amounts of data or AI expertise, some were curious to learn more about the greater landscape of machine learning. Here are some of the links to articles, podcasts and courses discussing some of the basics of machine learning that I've shared with them.