Jupyter Notebook -- Forget CSV, fetch data from DB with Python


If you read a book, article or blog about Machine Learning -- high chances it will use training data from CSV file. Nothing wrong with CSV, but let's think if it is really practical. Wouldn't be better to read data directly from the DB? Often you can't feed business data directly into ML training, it needs pre-processing -- changing categorial data, calculating new data features, etc. Data preparation/transformation step can be done quite easily with SQL while fetching original business data. Another advantage of reading data directly from DB -- when data changes, it is easier to automate ML model re-train process.