这个错误的原因可能是由于Data Catalog中的表和你在Bigquery或Dataflow中尝试使用的表不一致所导致的。可以通过以下步骤解决:
确认使用的表在Data Catalog中存在,并且表名和Schema名称与在Bigquery或Dataflow中使用的名称完全一致。
相应的API服务必须启用,已完全授权了BigQuery Data Flow等服务执行Google Cloud Data Catalog API操作。可以使用以下代码来检查API是否启用:
from google.cloud import datacatalog_v1
project_id = "your-project-id" location_id = "us-central1" # Data Catalog API must be enabled in this region. entry_group_id = "your-entry-group-id" entry_id = "your-entry-id"
data_catalog = datacatalog_v1.DataCatalogClient() entry = data_catalog.get_entry(name=f"projects/{project_id}/locations/" f"{location_id}/entryGroups/{entry_group_id}/" f"entries/{entry_id}") print(entry)
如果上面的代码因权限问题而失败,那么Cloud Data Catalog API可能没有正确授权。从而会导致“Could not resolve table in Data Catalog”错误。
from google.cloud.datacatalog_v1 import enums from google.cloud.datacatalog_v1 import Entry, Tag, TagTemplate, DataType
data_catalog = datacatalog_v1.DataCatalogClient()
project_id = 'your-project-id' location = 'us-central1'
entry_group_id = 'your-entry-group-id' entry_id = 'your-entry-id'
entry_group_path = data_catalog.entry_group_path(project_id, location, entry_group_id)
database_definition = {'project_id': 'BQ_PROJECT_ID', 'dataset_name': 'BQ_DATASET_NAME', 'table_name': 'BQ_TABLE_NAME'} database_table_name = f"projects/{project_id}/datasets/{database_definition['dataset_name']}/tables/{database_definition['table_name']}"
schema_fields = [{'name': 'column