Collaborating Authors


WIRED25: Netflix's Reed Hastings on Broadening Your Horizons


Thanks to Covid-19, the mantra for 2020 has got to be "quarantine and chill." Good thing Netflix is here to "entertain people all over the world," as the company's cofounder Reed Hastings explained at this year's WIRED25. Sating the global entertainment palate, though, requires an undying spirit of invention as well as narratives that span both the US and abroad. Netflix's secret, according to Hasting's new book No Rules Rules, is that it values its workers over its work process. It's this employee-centric attitude that allows a startup to maintain a culture of innovation as it grows from, say, a 30-person rent-by-mail DVD provider into the world's largest streaming service, with a film production arm that rivals Hollywood's Big Six.

GPT-3 Creative Fiction


What if I told a story here, how would that story start?" Thus, the summarization prompt: "My second grader asked me what this passage means: …" When a given prompt isn't working and GPT-3 keeps pivoting into other modes of completion, that may mean that one hasn't constrained it enough by imitating a correct output, and one needs to go further; writing the first few words or sentence of the target output may be necessary.

How Netflix uses AI to find your next series binge


Wait, how did Netflix know I wanted to watch that? Through the use of Machine Learning, Collaborative Filtering, NLP and more, Netflix undertake a 5 step process to not only enhance UX, but to create a tailored and personalised platform to maximise engagement, retention and enjoyment. In the last decade, learning algorithms and models at Netflix have evolved with multiple layers, multiple stages and nonlinearities. This has developed to the stage at which they now use machine learning and deep variants to rank large catalogues of content by determining the relevance of each of their titles to each user, creating a personalized content strategy. Not only is the content customized, it is then also ranked from most to least likely to be watched.

Conditional Self-Attention for Query-based Summarization Artificial Intelligence

Self-attention mechanisms have achieved great success on a variety of NLP tasks due to its flexibility of capturing dependency between arbitrary positions in a sequence. For problems such as query-based summarization (Qsumm) and knowledge graph reasoning where each input sequence is associated with an extra query, explicitly modeling such conditional contextual dependencies can lead to a more accurate solution, which however cannot be captured by existing self-attention mechanisms. In this paper, we propose \textit{conditional self-attention} (CSA), a neural network module designed for conditional dependency modeling. CSA works by adjusting the pairwise attention between input tokens in a self-attention module with the matching score of the inputs to the given query. Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query. We further studied variants of CSA defined by different types of attention. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer and previous models for the Qsumm problem.

Artificial Intelligence is completely reinventing media and marketing.


When artificial intelligence is fully operational, it will transform the media and marketing industries. In particular, I believe that synthetic personalities powered by AI will change the way we learn about new products and how to use them. In my previous article, I showed how the collapse of broadcast TV exposed a huge weakness in the advertising industry. And I pointed to the nascent field known as Influencer Media, and especially Virtual Influencers, as a harbinger of the future of engagement brand-building. What happens when artificial intelligence is available to any app, any advertising campaign, and any brand marketer? How will that change things? Here's my answer: the media landscape will be transformed so deeply that it will be completely unrecognizable. All the leftover junk from the 20th century will be kaputt, including one-size-fits-all video programs for mass audiences, appointment viewing of a TV schedule and the very concept of TV channels, and the outdated intrusion of interruption advertising. Personalized programming and fully-responsive adbots will be the new norm.

Build your First Multi-Label Image Classification Model in Python


Are you working with image data? This got me thinking -- what can we do if there are multiple object categories in an image? Making an image classification model was a good start, but I wanted to expand my horizons to take on a more challenging task -- building a multi-label image classification model! I didn't want to use toy datasets to build my model -- that is too generic. And then it struck me -- movie/TV series posters contain a variety of people. Could I build my own multi-label image classification model to predict the different genres just by looking at the poster?

Regression with Uncertainty Quantification in Large Scale Complex Data Machine Learning

While several methods for predicting uncertainty on deep networks have been recently proposed, they do not readily translate to large and complex datasets. In this paper we utilize a simplified form of the Mixture Density Networks (MDNs) to produce a one-shot approach to quantify uncertainty in regression problems. We show that our uncertainty bounds are on-par or better than other reported existing methods. When applied to standard regression benchmark datasets, we show an improvement in predictive log-likelihood and root-mean-square-error when compared to existing state-of-the-art methods. We also demonstrate this method's efficacy on stochastic, highly volatile time-series data where stock prices are predicted for the next time interval. The resulting uncertainty graph summarizes significant anomalies in the stock price chart. Furthermore, we apply this method to the task of age estimation from the challenging IMDb-Wiki dataset of half a million face images. We successfully predict the uncertainties associated with the prediction and empirically analyze the underlying causes of the uncertainties. This uncertainty quantification can be used to pre-process low quality datasets and further enable learning.

Leveraging Data Science for OTT Content Personalization


Why is content personalization important? OTT (Over the Top) platforms are transforming the global entertainment scene. The critical players, like Hulu, Netflix, and Disney, are competing in terms of viewership and revenues. With the increasing overlap of content across all these platforms, it is crucial for these services to improve the consumer experience by delivering relevant and engaging content to prevent audience churn. Content personalization is, therefore, vital to acquire more viewing time and improve market share.

Python at Netflix


As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members. We use and contribute to many open-source Python packages, some of which are mentioned below. If any of this interests you, check out the jobs site or find us at PyCon. We have donated a few Netflix Originals posters to the PyLadies Auction and look forward to seeing you all there.

Topic Modeling with Wasserstein Autoencoders Artificial Intelligence

We propose a novel neural topic model in the Wasserstein autoencoders (W AE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.