要获取AWS Athena表中每个列的行数,可以使用以下代码示例:
import boto3
def get_column_row_counts(database, table):
athena_client = boto3.client('athena', region_name='your_region_name')
# 查询每个列的行数
query = f"SELECT COUNT(*) FROM {database}.{table} GROUP BY column_name"
# 提交查询请求
response = athena_client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': database
},
ResultConfiguration={
'OutputLocation': 's3://your_s3_bucket/'
}
)
# 获取查询结果
execution_id = response['QueryExecutionId']
result_response = athena_client.get_query_results(QueryExecutionId=execution_id)
# 解析查询结果
column_row_counts = {}
for row in result_response['ResultSet']['Rows']:
column_name = row['Data'][0]['VarCharValue']
row_count = row['Data'][1]['VarCharValue']
column_row_counts[column_name] = row_count
return column_row_counts
请根据实际情况替换以下参数:
your_region_name:AWS区域名称,例如us-west-2。database:要查询的数据库名称。table:要查询的表名称。your_s3_bucket:用于存储查询结果的S3桶的URL。调用上述函数可以获取每个列的行数,例如:
column_row_counts = get_column_row_counts('your_database', 'your_table')
print(column_row_counts)
这将打印出一个字典,其中键是列名,值是该列的行数。