Whatever term you choose, they refer to a roughly related set of pre-modeling data activities in the machine learning, data mining, and data science communities. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting. This may include further munging, data visualization, data aggregation, training a statistical model, as well as many other potential uses. Data munging as a process typically follows a set of general steps which begin with extracting the data in a raw form from the data source, "munging" the raw data using algorithms (e.g. While I would first point out that I am not thrilled with the term "data sink," I would go on to say that it is "identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data" in the context of "mapping data from one'raw' form into another..." all the way up to "training a statistical model" which I like to think of data preparation as encompassing, or "everything from data sourcing right up to, but not including, model building."
Deep learning and Machine learning are becoming more and more important in today's ERP (Enterprise Resource Planning). During the process of building the analytical model using Deep Learning or Machine Learning the data set is collected from various sources such as a file, database, sensors and much more. But, the collected data cannot be used directly for performing analysis process. Therefore, to solve this problem Data Preparation is done. Data Preparation is an important part of Data Science. It includes two concepts such as Data Cleaning and Feature Engineering.
It seems that, anymore, the perception of machine learning is often reduced to passing a series of arguments to a growing number of libraries and APIs, hoping for magic, and awaiting the results. Maybe you have a very good idea of what's going on under the hood in these libraries -- from data preparation to model building to results interpretation and visualization and beyond -- but you are still relying on these various tools to get the job done. Using well-tested and proven implementations of tools for performing regular tasks makes sense for a whole host of reasons. Reinventing wheels which don't roll efficiently is not best practice... it's limiting, and it takes an unnecessarily long time. Whether you are using open source or proprietary tools to get your work done, these implementations have been honed by teams of individuals ensuring that you get your hands on the best quality instruments with which to accomplish your goals.