Goto

Collaborating Authors

 Bayesian Learning


This AI-augmented microscope uses deep learning to take on cancer

#artificialintelligence

According to the American Cancer Society, cancer kills more than 8 million people each year. Early detection can boost survival rates. Researchers and clinicians are feverishly exploring avenues to provide early and accurate diagnoses, as well as more targeted treatments. Blood screenings are used to detect many types of cancers, including liver, ovarian, colon and lung cancers. Current blood screening methods typically rely on affixing biochemical labels to cells or biomolecules.


Multi-Instance Dynamic Ordinal Random Fields for Weakly-supervised Facial Behavior Analysis

arXiv.org Artificial Intelligence

We propose a Multi-Instance-Learning (MIL) approach for weakly-supervised learning problems, where a training set is formed by bags (sets of feature vectors or instances) and only labels at bag-level are provided. Specifically, we consider the Multi-Instance Dynamic-Ordinal-Regression (MI-DOR) setting, where the instance labels are naturally represented as ordinal variables and bags are structured as temporal sequences. To this end, we propose Multi-Instance Dynamic Ordinal Random Fields (MI-DORF). In this framework, we treat instance-labels as temporally-dependent latent variables in an Undirected Graphical Model. Different MIL assumptions are modelled via newly introduced high-order potentials relating bag and instance-labels within the energy function of the model. We also extend our framework to address the Partially-Observed MI-DOR problems, where a subset of instance labels are available during training. We show on the tasks of weakly-supervised facial behavior analysis, Facial Action Unit (DISFA dataset) and Pain (UNBC dataset) Intensity estimation, that the proposed framework outperforms alternative learning approaches. Furthermore, we show that MIDORF can be employed to reduce the data annotation efforts in this context by large-scale.


SQL-Rank: A Listwise Approach to Collaborative Ranking

arXiv.org Machine Learning

In this paper, we propose a listwise approach for constructing user-specific rankings in recommendation systems in a collaborative fashion. We contrast the listwise approach to previous pointwise and pairwise approaches, which are based on treating either each rating or each pairwise comparison as an independent instance respectively. By extending the work of (Cao et al., 2007), we cast listwise collaborative ranking as maximum likelihood under a permutation model which applies probability mass to permutations based on a low rank latent score matrix. We present a novel algorithm called SQL-Rank, which can accommodate ties and missing data and can run in linear time. We develop a theoretical framework for analyzing listwise ranking methods based on a novel representation theory for the permutation model. Applying this framework to collaborative ranking, we derive asymptotic statistical rates as the number of users and items grow together. We conclude by demonstrating that our SQL-Rank method often outperforms current state-of-the-art algorithms for implicit feedback such as Weighted-MF and BPR and achieve favorable results when compared to explicit feedback algorithms such as matrix factorization and collaborative ranking.


Approximate Inference for Constructing Astronomical Catalogs from Images

arXiv.org Machine Learning

We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a Poisson random variable with a rate parameter that depends on the latent properties of stars and galaxies. These latent properties are themselves random, with scientific prior distributions constructed from large ancillary datasets. We compare two procedures for posterior inference: Markov chain Monte Carlo (MCMC) and variational inference (VI). MCMC excels at quantifying uncertainty while VI is 1000x faster. Both procedures outperform the current state-of-the-art method for measuring celestial bodies' colors, shapes, and morphologies. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores (1.3 million hardware threads) to construct an astronomical catalog from 50 terabytes of images.


Statistical shape analysis in a Bayesian framework for shapes in two and three dimensions

arXiv.org Machine Learning

In this paper, we describe a novel shape classification method which is embedded in the Bayesian paradigm. We discuss the modelling and the resulting shape classification algorithm for two and three dimensional data shapes. We conclude by evaluating the efficiency and efficacy of the proposed algorithm on the Kimia shape database for the two dimensional case.


Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

arXiv.org Machine Learning

In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.


Application of R\'enyi and Tsallis Entropies to Topic Modeling Optimization

arXiv.org Machine Learning

Thus, large arrays of textual data, which have been rapidly accumulating on the Internet in the last decade, require ever more complex methods for their automatic processing and modeling. For this, a wide range of mathematical tools, including topic models, is used [1], but their properties and behavior remain little studied so far, which makes it impossible to choose the optimal parameters of such models. If, however, we consider the results of topic modeling as nonequilibrium complex systems (since these, as will be shown below, have the characteristics of such systems), this would make it possible to apply to them a whole range of approaches from statistical physics. First of all, these are models for analyzing the processes of self-organization of large ensembles. The basis for such an analysis may be an approach in which the behavior of the topic model of a textual collection as a word ensemble would be determined by thermodynamic functions, such as entropy or free energy. It is known that complex systems can be characterized by exponential and power law distributions, which is especially characteristic for social [2, 3], biological [4, 5] and economic systems [6, 7].


Predictive Uncertainty Estimation via Prior Networks

arXiv.org Machine Learning

Estimating uncertainty is important to improving the safety of AI systems. Recently baseline tasks and metrics have been defined and several practical methods for estimating uncertainty developed. However, these approaches attempt to model distributional uncertainty either implicitly through model uncertainty or as data uncertainty. This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty. PNs do this by parameterizing a prior distribution over predictive distributions. This work focuses on uncertainty for classification and evaluates PNs on the tasks of identifying out-of-distribution (OOD) samples and detecting misclassification on the MNIST dataset, where they are found to outperform previous methods. Experiments on synthetic and MNIST data show that unlike previous methods PNs are able to distinguish between data and distributional uncertainty.


Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

arXiv.org Machine Learning

Complex network data may be analyzed by constructing statistical models that accurately reproduce structural properties that may be of theoretical relevance or empirical interest. In the context of the efficient fitting of models for large network data, we propose a very efficient algorithm for the maximum likelihood estimation (MLE) of the parameters of complex statistical models. The proposed algorithm is similar to the famous Metropolis algorithm but allows a Monte Carlo simulation to be performed while constraining the desired network properties. We demonstrate the algorithm in the context of exponential random graph models (ERGMs) - a family of statistical models for network data. Thus far, the lack of efficient computational methods has limited the empirical scope of ERGMs to relatively small networks with a few thousand nodes. The proposed approach allows a dramatic increase in the size of networks that may be analyzed using ERGMs. This is illustrated in an analysis of several biological networks and one social network with 104,103 nodes.


Does mitigating ML's impact disparity require treatment disparity?

arXiv.org Machine Learning

Following related work in law and policy, two notions of disparity have come to shape the study of fairness in algorithmic decision-making. Algorithms exhibit treatment disparity if they formally treat members of protected subgroups differently; algorithms exhibit impact disparity when outcomes differ across subgroups, even if the correlation arises unintentionally. Naturally, we can achieve impact parity through purposeful treatment disparity. In one thread of technical work, papers aim to reconcile the two forms of parity proposing disparate learning processes (DLPs). Here, the learning algorithm can see group membership during training but produce a classifier that is group-blind at test time. In this paper, we show theoretically that: (i) When other features correlate to group membership, DLPs will (indirectly) implement treatment disparity, undermining the policy desiderata they are designed to address; (ii) When group membership is partly revealed by other features, DLPs induce within-class discrimination; and (iii) In general, DLPs provide a suboptimal trade-off between accuracy and impact parity. Based on our technical analysis, we argue that transparent treatment disparity is preferable to occluded methods for achieving impact parity. Experimental results on several real-world datasets highlight the practical consequences of applying DLPs vs. per-group thresholds.