r/datascience - Apache Airflow running Docker Containers example setup


As always, it depends, if we are talking about few jobs then maybe Airflow is an overkill(though a very reliable and beautiful). For me it was worth the hassle from the beginning. I could not imagine setting a cronjob to handle hundreds of dependent jobs, hourly(which was my case at work). DAGs are also very reusable - I have dozens of clients to handle and each of them has literally almost the same DAG, differing only in parameters and DB connections. I am able to define an abstract DAG and reuse it.