AWS EMR通常会在启动集群时从S3中下载文件,可能会涉及到大量的list和head请求。但如果您的应用程序频繁地读取模型文件,则会有大量的list和head请求。您可以通过在应用程序中使用适当的缓存机制来减少这些请求次数。以下是Python代码示例:
import boto3 from botocore.exceptions import ClientError
s3 = boto3.client('s3')
def download_model(model_bucket, model_key): model_path = '/tmp/' + model_key.split('/')[-1]
# Check if model is in cache
try:
with open(model_path) as f:
print("Model %s found in cache..." % model_key)
return f.read()
# Download and cache model
except FileNotFoundError:
print("Downloading model %s..." % model_key)
try:
s3.download_file(model_bucket, model_key, model_path)
with open(model_path) as f:
return f.read()
except ClientError as e:
print(e)
raise Exception('Error downloading %s from %s' % (model_key, model_bucket))
print(download_model('my-model-bucket', 'models/my-model.h5')) # returns model as string
上一篇:AWSEMR是否能在单个集群中并行运行多个Spark应用程序?
下一篇:AWSEMR使用PySpark连接Mysql,但返回“requirementfailed:ThedrivercouldnotopenaJDBCconnection”。