示例代码: 增加AWS Glue工人数量:
import sys
from awsglue.context import GlueContext
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
# Set number of workers to 10
glueContext.setConf("spark.sql.shuffle.partitions", "10")
# ...add more code to your Glue ETL script
使用AWS Glue ETL分区功能:
import sys
from awsglue.context import GlueContext
from awsglue.utils import getResolvedOptions
from awsglue.dynamicframe import DynamicFrame
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'PARTITION_KEY'])
glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session
job = Job(glue