问题描述:在BigQuery中创建了一个时间戳分区表,但是查询该表时发现某些分区的数据无法被正确识别。 解决步骤如下:
from google.cloud import bigquery
client = bigquery.Client()
dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table('my_table')
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('timestamp_field', 'TIMESTAMP'),
],
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field='timestamp_field', # name of timestamp field
)
)
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
uri = 'gs://mybucket/mydata.json'
load_job = client.load_table_from_uri(
uri, table_ref, job_config=job_config # API request
)
load_job.result() # Waits for table load to complete.
注意:此处假设数据是以JSON格式保存在Google Cloud Storage(GCS)上的,可以根据实际情况修改代码来适应不同的数据源。
下一篇:BigQuery时区转换