EMRCluster: Type: "AWS::EMR::Cluster" Properties: ... VisibleToAllUsers: true Debugging: Enabled: true DebugHookEnabled: true DebugHookName: "my-debug-hook" ...
为了在调试过程中创建断点,请使用以下代码片段在EMR集群中启用debug hook:
$aws emr create-debug-logging-enabled-emr-cluster
--release-label emr-5.28.0
--log-uri s3://path/to/logs
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-12345xxxxx,KeyName=myEC2KeyPair
--applications Name=Ganglia Name=Hadoop Name=Hive Name=Hue Name=Pig
--configurations file://path/to/your/emr-config-file.json
--instance-groups Configuration=Master,InstanceCount=1,InstanceType=m3.xlarge Name=Master
Configuration=Core,InstanceCount=2,InstanceType=m3.xlarge Name=Core
--bootstrap-actions Name="image-cruncher",Path=s3://mybucket/cruncher.sh,Args=["arg1","arg2"]
--enable-debugging-hook
--debug-cli
为了连接到EMR集群并查看调试信息,请在EMR集群中启用SSH访问和端口转发。下面是启用SSH访问和端口转发的代码示例:
$ ssh -i myEC2KeyPair.pem -ND 8157 hadoop@ec2-xx-xx-xx-xx.compute-1.amazonaws.com
$ curl http://my-debug-hook:7777/echo/hello
$ ssh -i myEC2KeyPair.pem -NL 7777:my-debug-hook:7777 hadoop@ec2-xx-xx-xx-xx.compute-1.amazonaws.com
以上是通过CloudFormation在AWS EMR集群中启用调试的解决方法。
上一篇:AWSEMR集群使用Flink时无法运行任何Jar,而是报错为java.lang.NoSuchMethodError
下一篇:AWSEMR集群中SparkJupyterNotebook和PySparkJupyterNotebook的区别是什么,如何解决?