Athena使用外部表定义文件(通常以“.csv”或“.tsv”结尾)来确定如何将数据匹配到表的字段。
下面是一个基本的外部表定义文件示例,其中包含列名称和数据类型:
CREATE EXTERNAL TABLE mytable (
id INT,
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://my-bucket/my-path/';
在此示例中,文件中的第一列将被匹配到"id"字段,第二列将匹配到"name"字段,第三列将匹配到"age"字段。如果文件具有不同的列名、顺序或数据类型,则可能需要编辑表定义文件或修改源数据以确保相应字段正确匹配。
注意,在创建表时,必须确保指定的文件路径“LOCATION”正确地指向Amazon S3中的外部表定义文件。
以下是一个更具体的示例,包括如何从Amazon S3加载数据,以及如何查询表格数据:
-- Create external table definition
CREATE EXTERNAL TABLE mytable (
id INT,
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://my-bucket/my-path/';
-- Load data into the table from Amazon S3
INSERT INTO mytable
SELECT *
FROM
(
SELECT CAST(trim(COLUMN_GET(col_data, 'id' as INT)) AS INT) AS id,
CAST(trim(COLUMN_GET(col_data, 'name' as VARCHAR)) AS STRING) AS name,
CAST(trim(COLUMN_GET(col_data, 'age' as INT)) AS INT) AS age
FROM
(
SELECT split(line, ',') as cols
FROM
(
SELECT trim(regexp_replace(COLUMN_VALUE, '\\\\"', '')) as line
FROM mytable_raw,
UNNEST(EXTRACTARRAY('