3/11/2023 0 Comments Airflow technologyLuigi is a python package to build complex pipelines and it was developed at Spotify. The easiest way to understand Airflow is probably to compare it to Luigi. It is common to read that Airflow follows a “set it and forget it” approach, but what does that mean? It means that once a DAG is set, the scheduler will automatically schedule it to run according to the specified scheduling interval. The user is able to monitor DAGs and tasks execution and directly interact with them through a web UI. In that case, the parallelism will be managed using multiple processes.Īirflow provides also a very powerful UI. An alternative is to run the scheduler and executor on the same machine. With the Celery executor, it is possible to manage the distributed execution of tasks. The executor is a message queuing process (usually Celery) which decides which worker will execute each task. The scheduler uses the DAGs definitions, together with the state of tasks in the metadata database, and decides what needs to be executed. The metadata database stores the state of tasks and workflows. Real-life workflows can go from just one task per workflow (you don’t always have to be fancy) to very complicated DAGs, almost impossible to visualise.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |