检查VPC和子网设置是否正确,以确保AWS Glue作业可以访问所需的终端节点。
在安全组中配置规则,以允许Glue作业使用所需的端口访问数据存储。
检查AWS Glue作业配置中的连接设置,包括URL、用户名和密码等。确保连接信息正确,且Glue作业可以使用这些信息访问所需的数据存储。
以下是示例代码,使用AWS Glue连接Catalog Table:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session
## Define Connection
connection_name = "my-redshift-connection"
redshift_conn = glueContext.create_dynamic_frame.from_options(connection_name = connection_name, format="redshift",
order_by="id asc",
push_down_predicate="amount > 200")
## Access Catalog table
table = glueContext.create_dynamic_frame.from_catalog(database="my_db", table_name="my_table")
## Join the data
final_df = redshift_conn.toDF().join(table.toDF(),redshift_conn["some_id"] == table["id"], "inner").drop("id")
## Write the data to S3
s3_output_folder = "s3://my-bucket/output-folder"
final_dynamic_frame = DynamicFrame.fromDF(final_df, glueContext, "final_dynamic_frame")
glueContext.write_dynamic_frame.from_options(frame = final_dynamic_frame, connection_type = "s3", connection_options = {"path": s3_output_folder}, format = "csv")