Collaborating Authors

Machine Learning

Top Four Characteristics of Successful Data and AI-driven Companies


At Databricks, we have had the opportunity to help thousands of organizations modernize their data architectures to be cloud-first and extract value from their data at scale with analytics and AI. Over the past few years, we've been fortunate to engage directly with customers across industries and regions about their data-driven aspirations – and the roadblocks that slow down their ability to get there. While challenges vary greatly among industries and even individual organizations, we have developed a rich understanding of the top four habits of data and AI-driven organizations. Before diving into the habits, let's take a quick look at how organizations have approached enabling data strategies. First, data teams have made technology decisions over time that propel a way of thinking that is based around technology stacks: data warehousing, data engineering, streaming real-time data science, and machine learning.

Securing AI during the development process


There is enormous interest in and momentum around using AI to reduce the need for human monitoring while improving enterprise security. Machine learning and other techniques are used for behavioral threat analytics, anomaly detection and reducing false-positive alerts. At the same time, private and nation-state cybercriminals are applying AI to the other side of the security coin. Artificial intelligence is used to find vulnerabilities, shape exploits and conduct targeted attacks. How does an enterprise protect the tools it is building and secure those it is running during the production process?

Council Post: How Machine Learning Is Shaping The Future Of Advertising


Wendy Gonzalez is the CEO of Sama, the provider of accurate data for ambitious AI. Once associated with big New York City offices, patriarchal workplace culture and multi-million dollar budgets, the advertising industry has evolved considerably in the past century. Now diversified and modernized, remnants of the mid-century Madison Avenue advertising ecosystem are few and far between. But what's caused this shift? Industry leaders will be quick to tell you there's at least one tool that's been especially vital to this evolution: artificial intelligence (AI).

Integer-Only Inference for Deep Learning in Native C


Integer-only inference allows for the compression of deep learning models for deployment on low-compute and low-latency devices. Many embedded devices are programmed using native C and do not support floating-point operations and dynamic allocation. Nevertheless, small deep learning models can be deployed to such devices with an integer-only inference pipeline through uniform quantization and the fixed-point representation. We employed these methods to deploy a deep reinforcement learning (RL) model on a network interface card (NIC) (Tessler et al. 2021[1]). Successfully deploying the RL model required inference latency of O(microseconds) on a device with no floating-point operation support.

DigitalOwl raises $20M to analyze medical records for insurers


Did you miss a session from the Future of Work Summit? In health care, the process of underwriting and claims analysis can be both labor-intensive and error-prone. Claim adjusters and underwriters are often required to read and carefully parse hundreds of documents per case. Each year, the insurance market invests an estimated more than $3 billion in work hours devoted solely to collating and summarizing medical records. A 2006 U.S. National Institutes of Health study identified several major challenges in researching medical records, including assessing the quality of data and combining data from companies with dissimilar coding systems.

Deep Learning and NLP A-Z : How to create a ChatBot


We've talked about, speculated and often seen different applications for Artificial Intelligence - But what about one piece of technology that will not only gather relevant information, better customer service and could even differentiate your business from the crowd? ChatBots are here, and they came change and shape-shift how we've been conducting online business. Fortunately technology has advanced enough to make this a valuable tool something accessible that almost anybody can learn how to implement. If you want to learn one of the most attractive, customizable and cutting edge pieces of technology available, then this course is just for you!

Sync Computing aims to pick up where serverless leaves off


In our Outlook for 2022, we posed the question of whether data clouds – or cloud computing in general – get easier this year. Our question was directed at the bewildering array of cloud services. There's lots of choice for the customer, but could too much choice be too much of a good thing? "Serverless" is a style of programming for cloud platforms that is changing the way applications are built, deployed, and ultimately, consumed. Serverless is supposed to address that.

The First AI4TSP Competition: Learning to Solve Stochastic Routing Problems


The TSP is one of the classical combinatorial optimization problems, with many variants inspired by real-world applications. This first competition asked the participants to develop algorithms to solve a time-dependent orienteering problem with stochastic weights and time windows (TD-OPSWTW). It focused on two types of learning approaches: surrogate-based optimization and deep reinforcement learning. In this paper, we describe the problem, the setup of the competition, the winning methods, and give an overview of the results. The winning methods described in this work have advanced the state-of-the-art in using AI for stochastic routing problems. Overall, by organizing this competition we have introduced routing problems as an interesting problem setting for AI researchers.

Methods for inferring Causality


In our previous article Part 1: Getting started with Causal Inference, we covered the basics of causal inference and gave a lot of attention to Regression. We also discussed that regression is the not only way to close backdoors in causal estimation design. In this article, we are going to discuss some other methods, all aiming to achieve the same thing, that is, to make treatment and control groups similar in everything except in treatment. The goal of matching is to reduce the bias for the estimated treatment effect in an observational-data study, by finding, for every treated unit, one (or more) non-treated unit(s) with similar observable characteristics against which the covariates are balanced out. If there is some confounder, say age, which affects both the treatment and outcome, thereby making treatment and control group incomparable, we can make them comparable by matching each treated unit with a similar unit from the control group.

How AI can identify people even in anonymized datasets


How you interact with a crowd may help you stick out from it, at least to artificial intelligence. When fed information about a target individual's mobile phone interactions, as well as their contacts' interactions, AI can correctly pick the target out of more than 40,000 anonymous mobile phone service subscribers more than half the time, researchers report January 25 in Nature Communications. The findings suggest humans socialize in ways that could be used to pick them out of datasets that are supposedly anonymized. It's no surprise that people tend to remain within established social circles and that these regular interactions form a stable pattern over time, says Jaideep Srivastava, a computer scientist from the University of Minnesota in Minneapolis who was not involved in the study. "But the fact that you can use that pattern to identify the individual, that part is surprising."