Pseudolikelihood Reranking with Masked Language Models Machine Learning

We rerank with scores from pretrained masked language models like BERT to improve ASR and NMT performance. These log-pseudolikelihood scores (LPLs) can outperform large, autoregressive language models (GPT -2) in out-of-the-box scoring. RoBERTa reduces WER by up to 30% relative on an end-to-end LibriSpeech system and adds up to 1.7 BLEU on state-of-the-art baselines for TED Talks low-resource pairs, with further gains from domain adaptation. In the multilingual setting, a single XLM can be used to rerank translation outputs in multiple languages. The numerical and qualitative properties of LPL scores suggest that LPLs capture sentence fluency better than autoregressive scores. Finally, we finetune BERT to estimate sentence LPLs without masking, enabling scoring in a single, non-recurrent inference pass.

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection Machine Learning

The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated concerning a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generalization capability; furthermore, we assess several clustering-based methods in regression scenarios. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperformed the standard random selection of the original MLM formulation.

Machine learning in medicine: Addressing ethical challenges


This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors received no specific funding for this work. Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: EV has received speaking fees from SwissRe, Novartis R&D Academy, and Google Netherlands. IGC served as a consultant for Otsuka Pharmaceuticals advising on the use of digital medicine for its Abilify MyCite product. IGC is supported by the Collaborative Research Program for Biomedical Innovation Law, which is a scientifically independent collaborative research program supported by Novo Nordisk Foundation.

A Deep Choice Model

AAAI Conferences

Human choice is complex in two ways. First, human choice often shows complex dependency on available alternatives. Second, human choice is often made after examining complex items such as images. The recently proposed choice model based on the restricted Boltzmann machine (RBM choice model) has been proved to represent three typical phenomena of human choice, which addresses the first complexity. We extend the RBM choice model to a deep choice model (DCM) to deal with the features of items, which are ignored in the RBM choice model. We then use deep learning to extract latent features from images and plug those latent features as input to the DCM. Our experiments show that the DCM adequately learns the choice that involves both of the two complexities in human choice.

Restricted Boltzmann machines modeling human choice

Neural Information Processing Systems

We extend the multinomial logit model to represent some of the empirical phenomena that are frequently observed in the choices made by humans. These phenomena include the similarity effect, the attraction effect, and the compromise effect. We formally quantify the strength of these phenomena that can be represented by our choice model, which illuminates the flexibility of our choice model. We then show that our choice model can be represented as a restricted Boltzmann machine and that its parameters can be learned effectively from data. Our numerical experiments with real data of human choices suggest that we can train our choice model in such a way that it represents the typical phenomena of choice.

The Anatomy of Large Facebook Cascades

AAAI Conferences

When users post photos on Facebook, they have the option of allowing their friends, followers, or anyone at all to subsequently reshare the photo. A portion of the billions of photos posted to Facebook generates cascades of reshares, enabling many additional users to see, like, comment, and reshare the photos. In this paper we present characteristics of such cascades in aggregate, finding that a small fraction of photos account for a significant proportion of reshare activity and generate cascades of non-trivial size and depth. We also show that the true influence chains in such cascades can be much deeper than what is visible through direct attribution. To illuminate how large cascades can form, we study the diffusion trees of two widely distributed photos: one posted on President Barack Obama’s page following his reelection victory, and another posted by an individual Facebook user hoping to garner enough likes for a cause. We show that the two cascades, despite achieving comparable total sizes, are markedly different in their time evolution, reshare depth distribution, predictability of subcascade sizes, and the demographics of users who propagate them. The findings suggest not only that cascades can achieve considerable size but that they can do so in distinct ways.