Airflow DAG序列化缓存是指当DAG被调度器执行时,系统会将DAG对象序列化并缓存起来,以便在下一次执行时能够快速加载DAG,提高调度性能。然而,由于缓存不会自动过期,当修改了DAG代码后,调度器可能仍会加载旧的缓存,导致任务执行失败或出现其它问题。
为了避免这种问题,可以手动删除缓存中的DAG序列化数据。具体方法是在DAG文件中添加如下代码:
from airflow.serialization.serialized_objects import SerializedDAG
from airflow.utils.dag_processing import SimpleDag
from airflow.serialization import schema
from airflow.utils import timezone
def clear_serialized_dag_cache(dag_id):
SerializedDAG.clear_dag(dag_id)
SimpleDag.clear_dag(dag_id)
schema.DAG_SCHEMA_CACHE._cache.pop(dag_id, None)
schema.DAG_SCHEMA_CACHE._cache_by_file_path.pop(dag_id, None)
timezone.utcnow() # Needed to ensure delete is propagated
然后在需要删除缓存的地方调用clear_serialized_dag_cache
函数并传入DAG ID即可。例如,我们可以在dag_bag_import.py
文件中添加如下代码,以在每次导入DAG时删除缓存,以确保加载的是最新的DAG代码:
from airflow import DAG
from airflow.models import DagBag
from airflow import settings
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
import my_dag1
import my_dag2
import my_dag3
def import_dags():
dagbag = DagBag(dag_folder=settings.DAGS_FOLDER)
for dag_id, dag in dagbag.dags.items():
# Clear the serialized DAG cache to ensure we're loading
# the latest version of the DAG code
clear_serialized_dag_cache(dag_id)
dag = DAG(
'dag_bag_import',