Architecting a Machine Learning Pipeline

#artificialintelligence 

Funneling incoming data into a data store is the first step of any ML workflow. The key point is that data is persisted without undertaking any transformation at all, to allow us to have an immutable record of the original dataset. Data can be fed from various data sources; either obtained by request (pub/sub) or streamed from other services. NoSQL document databases are ideal for storing large volumes of rapidly changing structured and/or unstructured data since they are schema-less. They also offer a distributed, scalable, replicated data storage.