在Athena中,查询大数据集可能会遇到资源耗尽的问题,主要原因是作业对资源的需求超过了Athena为之分配的资源。以下是几种优化给定查询的方法:
aws athena start-query-execution --query-string "SELECT * FROM mytable" --result-configuration OutputLocation=s3://mybucket/ --work-group myworkgroup --query-execution-context '{"Database": "mydatabase"}' --query-execution-context '{"ReadsTruncated": "true", "ResultSizeEstimation": "1000000"}' --result-configuration '{"EncryptionConfiguration": {"EncryptionOption": "SSE_S3"}}' --scale-up-query-execution StartQueryExecution --scaling-configuration "{\"AutoPause\":true,\"MaxCapacity\":2,\"MinCapacity\":2,\"SecondsUntilAutoPause\":600,\"TimeoutAction\":\"FORCE_DISABLE\"}"
SELECT * FROM mytable LIMIT 1000
ALTER TABLE mytable ADD PARTITION (ds='2019-01-01', hr='01') LOCATION 's3://mybucket/mytable/ds=2019-01-01/hr=01'
以上是三种优化Athena查询的方法