Collaborating Authors

Data Science

Announcing the winners of the Sample-Efficient Sequential Bayesian Decision Making request for proposals - Facebook Research


In February 2021, Facebook launched a request for proposals (RFP) on sample-efficient sequential Bayesian decision-making. View RFP In a Q&A about the RFP, Core Data Science researchers said they are keen to learn more about all the great research that is going on in the area of Bayesian optimization. Eytan Bakshy and Max Balandat, members of the team behind the RFP, also spoke about sharing a number of really interesting real-world use cases that they hope can help inspire additional applied research and increase interest and research activity into sample-efficient sequential Bayesian decision-making. The team reviewed 89 high-quality proposals and are pleased to announce the two winning proposals below, as well as the 10 finalists. Thank you to everyone who took the time to submit a proposal, and congratulations to the winners.

A Comprehensive Guide On How to Monitor Your Models in Production -


Yup, that's me being plowed to the ground because the business just lost more than $500,000 with our fraud detection system by wrongly flagging fraudulent transactions as legitimate, and my boss's career is probably over. You're probably wondering how we got here… My story began with an image that you've probably seen over 1,001 times--the lifecycle of an ML project. A few months ago, we finally deployed to production after months of perfecting our model. I told myself and my colleague, "Our hard work has surely paid off, hasn't it?". Our model was serving requests in real-time and returning results in batches--good stuff! Surely that was enough, right? Well, not quite, which we got to realize in a relatively dramatic fashion. I'm not going to bore you with the cliché reasons why the typical way of deploying working software just doesn't cut it with machine learning applications. I'm still trying to recover from the bruises that my boss left on me, and the least I can do is help you not end up in a hospital bed after "successful model deployment", like me. I'll tell you all about: By the end of this article, you should know exactly what to do after deploying your model, including how to monitor your models in production, how to spot problems, how to troubleshoot, and how to approach the "life" of your model beyond monitoring. You almost don't have to worry about anything. Based on the software development lifecycle, it should work as expected because you have rigorously tested it and deployed it. In fact, your team may decide on a steady and periodic release of new versions as you mostly upgrade to meet new system requirements or new business needs.

Deriving Equations from Sensor Data Using Dimensional Function Synthesis

Communications of the ACM

The original version of this paper was published in ACM Transactions on Embedded Computing Systems, October 2019. This work is licensed under a The Digital Library is published by the Association for Computing Machinery.

Customer Insights 2021 Predictions: Evolution And Collaboration


CI leaders will shift 10% of their budgets to emotion analytics. Emotions are a more important driver of consumer decisions than rational thought and thus are the largest factor in brand energy, customer experience, and marketing effectiveness. But for the past decade, CI professionals have leaned into the precision of big data analytics instead of the traditionally unquantifiable territory of emotion. New techniques change this dynamic: AI-based text analytics tools such as Clarabridge and IBM Watson improve the precision of cruder sentiment analysis tools, while firms such as Nielsen and Realeyes bring biometric and facial analysis methodologies from the lab to the business world. As data analytics becomes commoditized, firms will shift 10% of the insights budget to emotion analytics to pilot new techniques in search of competitive advantage in the "why" behind consumer behavior, not just the "what" that data analytics addresses. Companies will reorganize to ensure CX and CI collaboration.

AI Centers Of Excellence Accelerate AI Industry Adoption


It is important to note that there are several functional and operational models that enterprises are adapting in regard to CoE. The change management model focuses on emphasizing the prospective innovation that artificial intelligence can provide for business stakeholders in the organization. Central to this model is education and training of executives and business units. In addition to change management, the Sandbox approach is another central model, in which the CoE acts as the company's hub of innovation and R&D. This model emphasizes proofs of concepts and different emerging technologies.

Kaggle BIPOC Grant program-My experience


This year, Kaggle started a new program called the BIPOC (Black, Indigenous, People of Color) Grant Program. It aims to empower underrepresented data scientists with support to advance their careers and aspirations. I am grateful that I was one of the few people who became a part of this wonderful program. All the students who became part of the program were assigned a mentor as well. I had done a few basic projects before I became a part of this program.

Marcin Pionnier on finishing 5th in the RTA competition


I graduated on Warsaw University of Technology with master thesis about text mining topic (intelligent web crawling methods). I work for Polish IT consulting company (Sollers Consulting), where I develop and design various insurance industry related stuff, (one of them is insurance fraud detection platform). From time to time I try to compete in data mining contests (Netflix, competitions on Kaggle and As far as I remember, the basis of the solution I defined at the very beginning: to create separate predictors for each individual loop and time interval. So my solution required me to build 61x10 610 regression models.

A Complete Data Science Roadmap in 2021


If you want to learn data science from scratch, the first thing you need to do is learn how to code. Pick a programming language (either Python or R), and start learning. I suggest starting out with Python because it is more widely used than R. It is also more general and highly flexible, and you will be able to make the transition to different domains (data analytics, web development) if you have Python knowledge. This DataCamp course will take you through exercises and teach you how to code in Python. What will you learn in this course?

Senior Data Analyst


As the health and safety of our candidates and our employees come first, we're excited to provide virtual experiences for interviews and new hire on-boarding. Dataminr puts real-time AI and public data to work for our clients, generating relevant and actionable alerts for global corporations, public sector agencies, newsrooms, and NGOs. Our real-time alerts enable tens of thousands of users at hundreds of public and private sector organizations to learn first of breaking events around the world, develop effective risk mitigation strategies, and respond with confidence as crises unfold. Dataminr is making its mark for growth and innovation, recently earning recognition on the Deloitte Technology Fast 500, Forbes AI 50 and Forbes Cloud 100 lists. We also earned accolades for'Most Innovative Use of AI' from the 2020 AI & Machine Learning Awards.

How to Become a Data Scientist (Step-By-Step) in 2020


Data science is one of the most buzzed about fields right now, and data scientists are in extreme demand. And with good reason -- data scientists are doing everything from creating self-driving cars to automatically captioning images. Given all the interesting applications, it makes sense that data science is a very sought-after career. Data science is applied in many field, including in developing self-driving cars. If you're reading this post, I'm assuming that you'd like to learn how to become a data scientist.