Criteo Releases Industrys Largest Ever Dataset


New York – June 18, 2015 – Criteo (NASDAQ: CRTO), the performance marketing technology company, today announced the release of the largest public machine learning dataset ever issued to the open source community, with the goal of supporting academic research and innovation in distributed machine learning algorithms. With the increasing prevalence of large-scale data problems across industries, including performance advertising, the release of datasets such as this are necessary to advance research in the academic space and drive industry progress. Anonymized datasets pulled from real-world applications allow academic researchers to test, refine and advance the machine learning platforms that so many enterprises now rely on. Criteo, for example, relies on its own proprietary distributed learning algorithms to accurately predict when a consumer is most likely to click on a particular ad, thereby increasing the return on an advertiser's investment in ad delivery. "Accuracy and speed of machine learning algorithms are critical to the success of our business and many others, but they would be impossible to achieve without publicly available datasets," said Olivier Chapelle, Principal Research Scientist at Criteo.

Unsupervised learning and data clustering for the construction of Galaxy Catalogs in the Dark Energy Survey Machine Learning

Large scale astronomical surveys continue to increase their depth and scale, providing new opportunities to observe large numbers of celestial objects with ever increasing precision. At the same time, the sheer scale of ongoing and future surveys pose formidable challenges to classify astronomical objects. Pioneering efforts on this front include the citizen science approach adopted by the Sloan Digital Sky Survey (SDSS). These SDSS datasets have been used recently to train neural network models to classify galaxies in the Dark Energy Survey (DES) that overlap the footprint of both surveys. While this represents a significant step to classify unlabeled images of astrophysical objects in DES, the key issue at heart still remains, i.e., the classification of unlabelled DES galaxies that have not been observed in previous surveys. To start addressing this timely and pressing matter, we demonstrate that knowledge from deep learning algorithms trained with real-object images can be transferred to classify elliptical and spiral galaxies that overlap both SDSS and DES surveys, achieving state-of-the-art accuracy 99.6%. More importantly, to initiate the characterization of unlabelled DES galaxies that have not been observed in previous surveys, we demonstrate that our neural network model can also be used for unsupervised clustering, grouping together unlabeled DES galaxies into spiral and elliptical types. We showcase the application of this novel approach by classifying over ten thousand unlabelled DES galaxies into spiral and elliptical classes. We conclude by showing that unsupervised clustering can be combined with recursive training to start creating large-scale DES galaxy catalogs in preparation for the Large Synoptic Survey Telescope era.

Facebook Introduces Dataset & Challenge to Counter DeepFakes


How to identify and respond to "Deepfake" videos -- realistic AI-synthesized video generated for the purpose of spreading misinformation -- is a challenge that has been highlighted by recent social media stumbles on the question, particularly from Facebook. Several months ago Facebook was criticized for failing to remove a viral video manipulated to make US House Speaker Nancy Pelosi sound drunk. In collaboration with Partnership on AI, Microsoft, and academics from top universities, Facebook today announced the Deepfake Detection Challenge (DFDC) with the aim of finding innovative deepfake detection solutions to help the media industry spot videos that have been morphed by AI models. The challenge includes a dataset of video pairs (originals filmed by paid actors and tampered versions generated by various AI techniques). Facebook says no actual Facebook user data will be used, and has pledged US$10 million to encourage global participation in the challenge.

Machine Unlearning: Fighting for the Right to Be Forgotten


Data protection and privacy have been discussed nonstop as more and more people come to realize just how much personal information they are sharing through the countless apps and websites they regularly visit. It's no longer so surprising to see products you've talked about with friends or concerts you've searched on Google promptly appear as advertisements in your social media feeds. And that has many people concerned. Recent government initiatives such as the EU's General Data Protection Regulation (GDPR) are designed to protect individuals' data privacy, with a core concept being "the right to be forgotten." The bad news is, it's generally difficult to revoke things that have already been shared online or to properly delete such data.

Big tech firms' AI hiring frenzy leads to brain drain at UK universities


British universities are being stripped of artificial intelligence (AI) experts in a brain drain to the private sector that is hampering research and disrupting teaching at some of the country's leading institutions. Scores of talented scientists have left or passed up university posts for salaries two to five times higher at major technology firms, where besides getting better pay, new recruits can take on real-world problems with computer power and datasets that academia cannot hope to provide. The impact of the hiring frenzy is revealed in a confidential Guardian survey of the UK's elite Russell Group universities, which found that many top institutions were struggling to keep up with the demand from tech firms that are aggressively expanding their AI research groups. One university executive said AI researchers were courted by industry on a routine basis and that departments regularly missed out on the best talent when companies made better offers. "We need top quality staff to teach and research and the implications of not achieving this don't need to be spelt out," the executive told the Guardian.