San Francisco-based Databricks, a company that offers the capabilities of a data warehouse and data lake in a single "lakehouse" architecture, today announced its first industry-specific offering: Lakehouse for Retail. Designed for enterprises dealing in the retail and consumer goods vertical, Databricks says Lakehouse for Retail is a fully integrated platform that aims to solve the most critical challenges retailers and their partners face while trying to leverage surging data volumes for AI and analytics projects. The solution, which is generally available as of today, has already seen early adoption from major retail enterprises including Walgreens, Columbia, H&M Group, Reckitt, Restaurant Brands International, 84.51, Co-Op Food, Gousto, and Acosta. "With hundreds of millions of prescriptions processed by Walgreens each year, Databricks' Lakehouse for Retail allows us to unify all of this data and store it in one place for a full range of analytics and ML workloads," said Luigi Guadagno, the VP of pharmacy and healthcare platform at Walgreens. "By eliminating complex and costly legacy data silos, we've enabled cross-domain collaboration with an intelligent, unified data platform that gives us the flexibility to adapt, scale and better serve our customers and patients," Guadagno said.
"The vision of lakehouse helps solve many of the challenges retail organizations have told us they're facing," said Ali Ghodsi, chief executive and co-founder of Databricks. The San Francisco-based company sells services based on open-source Apache Spark, a real-time data-analytics technology that Mr. Ghodsi helped create. Apache Spark emerged from the University of California, Berkeley, in 2009. Lakehouse for Retail consolidates a variety of information in a single digital repository. In the past, such repositories, often called data lakes, required users to make a copy of the data so that it could be structured and analyzed in a separate environment, the company said.
As data sources and volumes grow, and as a data-driven orientation is increasingly deemed to be a competitive necessity, the war between platform vendors to provide the primary repository for our data is intense. The war has several fronts, one of which is analytics. And within that scope, the data warehouse and data lake camps are the main combatants. The data warehouse side is strong, as it includes a combination of stalwart incumbent vendors like Teradata and Vertica (now part of Micro Focus), all three major cloud providers and industry darling Snowflake. On the data lake side, independent providers, like Cloudera and the aforementioned Databricks, are perhaps the most emblematic competitors.
In the ongoing debate about where companies ought to store data they want to analyze – in a data warehouses or in data lake -- Databricks today unveiled a third way. With SQL Analytics, Databricks is building upon its Delta Lake architecture in an attempt to fuse the performance and concurrency of data warehouses with the affordability of data lakes. The big data community currently is divided about the best way to store and analyze structured business data. Some, like Dremio co-founder Tomer Shiran, say the reasons for using data warehouses have shrunk thanks to advances in data virtualization and the ability to remotely query object stores in almost the same manner as a data warehouse. Others, like Fivetran CEO George Fraser, have gone on record saying data lakes are legacy tech thanks to the ability of modern cloud data warehouses to separate compute and storage.
Databricks, the Data and AI company and pioneer of the data lakehouse architecture, today announced Databricks Partner Connect, a one-stop portal for customers to quickly discover a broad set of validated data, analytics, and AI tools and easily integrate them with their Databricks lakehouse across multiple cloud providers. Integrations with Databricks partners Fivetran, Labelbox, Microsoft Power BI, Prophecy, Rivery, and Tableau are initially available to customers, with Airbyte, Blitzz, dbt Labs, and many more to come in the months ahead. Enterprises want to drive complexity out of their data infrastructure and adopt more open technologies to take better advantage of analytics and AI. The data lakehouse enabled by Databricks has put thousands of customers on this path, collectively processing multiple exabytes of data a day on a single platform for analytics and AI workloads. But, the data ecosystem is vast, and no one vendor can accomplish everything.