对于 AWS Glue 和 EMR Serverless 的比较,可以分别从以下两个方面入手:
import boto3
import sys
from awsglue.utils import getResolvedOptions
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
glue_client = boto3.client('glue')
job = Job(glue_client, args['JOB_NAME'])
job.init()
...
job.commit()
以 EMR Serverless 为例的 Python 代码示例:
import boto3
emr = boto3.client('emr')
step_args = ['jar', 's3://elasticmapreduce/samples/wordcount/wordcount.jar', 's3://elasticmapreduce/samples/wordcount/input', 's3://myawsbucket/wordcount/output']
response = emr.start_job_run(
Name='wordcount',
Steps=[{
'Name': 'wordcount',
'ActionOnFailure': 'TERMINATE_JOB_FLOW',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': step_args
}
}],
VisibleToAllUsers=True,
JobFlowRole='myEMRRole',
ServiceRole='theEMRServiceRole'
)
...
job.init()
# Adjust the number of workers & workersType properties to optimize cost
job_configs = {"workers": 1, "workerType": "Standard"}
job.run(job_configs)
以 EMR Serverless 为例的