data integration variety problem
Solving the data integration variety problem at scale, with Gobblin
Editor's Note: Recently, the Apache Software Foundation (ASF) announced Apache Gobblin as a Top-Level Project (TLP). Our big data ecosystem is larger than 1 exabyte and growing, while ingesting and processing upwards of seven trillion Kafka events per day. At this scale, data integration is an incredibly complex problem. This is not only because of the multitude and variety of online and offline systems within the company, but also because of the hundreds of third-party data providers and partners that we work with. Add the plethora of protocols and formats that each brings, and the permutations quickly become untenable.