在BigQuery到GCS的抽取代码中,to_delete参数是可选的。如果指定该参数,那么在抽取数据之前,旧的输出文件将被删除。如果未指定该参数,则任何旧文件都将被保留。以下是一个示例代码,可以在抽取数据时使用to_delete参数来删除旧的输出文件:
from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'my-bucket-name'
destination_uri = 'gs://{}/{}'.format(bucket_name, 'output-file-name')
dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table('my_table')
job_config = bigquery.ExtractJobConfig()
job_config.compression = 'GZIP'
job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON
job_config.field_delimiter = '|'
job_config.print_header = False
job_config.to_json = True
# delete old files before writing new ones
job_config.to_delete = True
extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=job_config
)
extract_job.result() # Wait for job to complete.
print('Exported {}:{}.{} to {}'.format(
project, dataset_id, table_id, destination_uri))
该示例代码中,使用job_config.to_delete = True
来删除旧的输出文件。如果不希望删除旧文件,则将该参数设置为False。