Goto

Collaborating Authors

 data organisation


A Survey of Pipeline Tools for Data Engineering

Mbata, Anthony, Sripada, Yaji, Zhong, Mingjun

arXiv.org Artificial Intelligence

Currently, a variety of pipeline tools are available for use in data engineering. Data scientists can use these tools to resolve data wrangling issues associated with data and accomplish some data engineering tasks from data ingestion through data preparation to utilization as input for machine learning (ML). Some of these tools have essential built-in components or can be combined with other tools to perform desired data engineering operations. While some tools are wholly or partly commercial, several open-source tools are available to perform expert-level data engineering tasks. This survey examines the broad categories and examples of pipeline tools based on their design and data engineering intentions. These categories are Extract Transform Load/Extract Load Transform (ETL/ELT), pipelines for Data Integration, Ingestion, and Transformation, Data Pipeline Orchestration and Workflow Management, and Machine Learning Pipelines. The survey also provides a broad outline of the utilization with examples within these broad groups and finally, a discussion is presented with case studies indicating the usage of pipeline tools for data engineering. The studies present some first-user application experiences with sample data, some complexities of the applied pipeline, and a summary note of approaches to using these tools to prepare data for machine learning.


4 Steps to be Successful with the Digital Transformation of Your Business

#artificialintelligence

According to McKinsey, data-driven organisations are 23x more likely to acquire customers, 6x more likely to retain customers and 19x more likely to be profitable. Being data-driven is good for business. Therefore, it is not a surprise that one question I always get when advising organisations is how to digitally transform your business and remain relevant in these fast-changing times. My first answer is to tell them that they have to achieve a gestalt shift, where they see their organisation from a different perspective. Instead of looking at your organisation from a product standpoint, you should see your organisation as a data organisation.


3 Concepts Defining the Future of Work: Data, Decentralisation and Automation

#artificialintelligence

The organisation of tomorrow will look fundamentally different than today's organisation. Those enterprises that are aware of the upcoming changes can best prepare and achieve competitive advantage in a data-driven society. Consequently, the future of work will require management and employees to take a different approach when it comes to creating and delivering a product or service. The future of work will be defined by three concepts: data, decentralisation and automation that will radically change leadership, culture, privacy and security. Let's discuss each of these concepts.


3 Concepts that Define the Future of Work: Data, Decentralisation and Automation

#artificialintelligence

The organisation of tomorrow will look fundamentally different than today's organisation. Those enterprises that are aware of the upcoming changes can best prepare and achieve competitive advantage in a data-driven society. Consequently, the future of work will require management and employees to take a different approach when it comes to creating and delivering a product or service. The future of work will be defined by three concepts: data, decentralisation and automation that will radically change leadership, culture, privacy and security. Let's discuss each of these concepts.


Why the Organisation of the Tomorrow is a Data Organisation

#artificialintelligence

The fast-changing, uncertain and ambiguous environments that organisations operate in today, requires organisations to re-think all their internal business processes and customer touch points. In addition, due to the availability of emerging (information) technologies such as big data, blockchain and artificial intelligence, it has become easier for startups to compete with existing organisations. Often these startups are more flexible and agile than Fortune 1000 companies and they can become a significant threat if not paid attention to. Therefore, focusing purely on the day-to-day operation is simply not enough and organisations have to become innovative and adaptive to change if they wish to remain competitive. The key characteristic of these new startups is that they are, at its core, a data company, regardless of the product or service they offer.


Why the Organisation of the Tomorrow is a Data Organisation

#artificialintelligence

The fast-changing, uncertain and ambiguous environments that organisations operate in today, requires organisations to re-think all their internal business processes and customer touch points. In addition, due to the availability of emerging (information) technologies such as big data, blockchain and artificial intelligence, it has become easier for startups to compete with existing organisations. Often these startups are more flexible and agile than Fortune 1000 companies and they can become a significant threat if not paid attention to. Therefore, focusing purely on the day-to-day operation is simply not and organisations have to become innovative and adaptive to change if they wish to remain competitive. The key characteristic of these new startups is that they are, at its core, a data company, regardless of the product or service they offer.