Simplifying Data Analytics Pipelines using a Data Lake

#artificialintelligence 

As part of enterprise artificial intelligence (AI) initiatives, data engineering teams are using a wide range of data analytics techniques, ranging from streaming analytics to machine learning to deep learning. This diversity in techniques has led to a corresponding diversity in software platforms and tools. Most data engineering teams are using data ingestion frameworks, such as Kafka; a combination of machine learning tools, such as Hadoop, Splunk, SAS Analytics, Spark, Python and R; and open-source deep learning packages, such as TensorFlow, Caffe and PyTorch. In traditional data analytics pipelines, data flows into enterprise environments from various internal and external sources and gets pre-processed and cleansed. Enterprises commonly use a "staging area" to store intermediate representations of pre-processed data.