Confluent, the company whose founders created Apache Kafka, confidentially filed for IPO late yesterday. In effect, this filing was a formal intent to IPO, with key details such as the number of shares and proposed price range still in flux. Valued at over $4 billion, Confluent, along with companies like Databricks, could be considered super-unicorns, and given the inflating valuations, the question wasn't whether, but when Confluent would finally file for public offering. We're still asking the same thing for Databricks, whose funding has topped $1 billion and whose valuation is off the charts. Confluent achieved unicorn status a couple years ago, and like fellow former unicorn (now public MongoDB), adopted its own quasi open source licensing to prevent cloud providers from monetizing the IP, not from Kafka (which remains an Apache project), but all the enterprise goodies and connectors that the company has built around it.
Snowflake and Databricks have more than their long association with the Amazon Web Services cloud in common: they have dozens of joint customers. It's the classic case of where different groups have their own tools: data scientists and data engineers who model and perform data engineering in Databricks, and business analysts who do query and reporting in Snowflake. Being in the AWS cloud means that they also share common storage of data in S3. But for Snowflake and Databricks users, until now it's been a case of so near and so far. The services may have (hopefully) been in the same availability zone and stored in the same S3 instances.
R, along with Python, is the most popular language among enterprise data scientists. The R ecosystem includes thousands of packages for statistical analysis and machine learning as well as advanced graphical capabilities. R users across enterprises are expressing strong interest in leveraging cloud for R workloads. Cloud offers several unique advantages, such as accessing ever-growing datasets, easily scaling up compute resources for processing large data, managing resources more cost efficiently.
XGBoost is a popular machine learning library designed specifically for training decision trees and random forests. For information about installing XGBoost on Databricks Runtime, or installing a custom version on Databricks Runtime ML, see these instructions. You can train XGBoost models on an individual machine or in a distributed fashion.