在 AWS EMR 步骤的代码中添加以下内容以确保步骤在出现错误时停止执行:
from __future__ import print_function
import sys
import boto3
# Set up the command to run on the cluster
command = "command-to-run"
# Run the command on the cluster
response = client.run_job_flow(
Name='job-name',
ReleaseLabel='emr-release',
Instances={
'InstanceGroups': [
{
'Name': 'Master nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm4.xlarge',
'InstanceCount': 1
},
{
'Name': 'Slave nodes',
'Market': 'SPOT',
'InstanceRole': 'CORE',
'InstanceType': 'm4.xlarge',
'InstanceCount': 3,
'BidPrice': '0.15'
}
],
'Ec2KeyName': 'key-pair',
'KeepJobFlowAliveWhenNoSteps': False, # set to False to stop the cluster when the steps are completed
'TerminationProtected': False,
'Ec2SubnetId': 'subnet-id'
},
Steps=[
{
'Name': 'job-step-name',
'ActionOnFailure': 'TERMINATE_JOB_FLOW', # stop the cluster if this step fails
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': [command]
}
}
],
VisibleToAllUsers=True,
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole'
)
print("Cluster started with step: %s" % response['JobFlowId'])
在上面的代码示例中,我们使用 KeepJobFlowAliveWhenNoSteps
参数将 AWS EMR 群集设置为在步骤完成后停止运行。如果步骤失败,我们还可以使用 ActionOnFailure
参数将 AWS EMR 群集设置为终止。 停止运