这个问题的根本原因是工作节点缩容时Kubernetes会发送SIGTERM信号给Pod。如果Pod内的Airflow任务正在运行,它们将被终止并失败。解决方法是在Airflow安装中启用Kubernetes Executor,并将airflow.cfg配置文件中的“run_as_user”设置为非root用户,以便在Pod终止时优雅地停止任务。此外,还需要使用Kubernetes的SignalPropagationPolicy特性来修改Pod Termination Grace Period,以确保任务在Pod被终止之前有足够的时间来完成。
以下是一个示例yaml文件,展示了如何使用SignalPropagationPolicy修改Termination Grace Period。
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: airflow
name: airflow
spec:
replicas: 1
selector:
matchLabels:
app: airflow
template:
metadata:
labels:
app: airflow
spec:
serviceAccountName: airflow
terminationGracePeriodSeconds: 1800
containers:
- name: airflow
image: apache/airflow:2.1.2
command: [ "/bin/bash", "-c", "--" ]
args: [ "sleep 1000000" ]
terminationMessagePolicy: File
- name: dags
image: example/airflow-dags
volumeMounts:
- name: dags
mountPath: /usr/local/airflow/dags
volumes:
- name: dags
configMap:
name: airflow-dags
items:
- key: dag.py
path: dag.py
eviction:
deleteOptions:
propagationPolicy: Foreground