ApacheBeam管道无法将数据插入BigQuery。工作流失败。 _编程开发

ApacheBeam管道无法将数据插入BigQuery。工作流失败。

创始人

2024-09-05 11:30:21

0次

确保您已经正确配置了Beam和BigQuery的认证信息，例如使用gcloud auth login进行身份验证。
通过检查日志或Beam管道程序的输出，查找与插入BigQuery相关的错误信息，并尝试解决这些错误。
确保Beam管道程序中的BigQuery目标表已经正确创建，并具有正确的模式（即与数据的格式匹配）。
如果Beam管道程序没有成功插入数据到BigQuery，可以查看BigQuery的错误日志以获得更多的信息。以下是一个基本的Beam管道程序示例，用于将数据写入BigQuery：

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.gcp.bigquery import BigQuerySink

class User():

    def __init__(self, id, name, email):
        self.id = id
        self.name = name
        self.email = email

class UserPipeline():
    
    PROJECT_ID = 'your-project-id'
    DATASET_ID = 'your-dataset-id'
    TABLE_NAME = 'users'

    def __init__(self, pipeline_args=None):
        self.pipeline_options = PipelineOptions(pipeline_args)
        self.pipeline = beam.Pipeline(options=self.pipeline_options)

    def run(self, users):
        (
            self.pipeline
            | "Create Beam PCollection" >> beam.Create(users)
            | "Write to BigQuery" >> BigQuerySink(
                project=self.PROJECT_ID,
                dataset=self.DATASET_ID,
                table=self.TABLE_NAME,
                schema="id:INTEGER,name:STRING,email:STRING"
            )
        )

        self.pipeline.run().wait_until_finish()

if __name__ == "__main__":
    users = [
        User(id=1, name="John", email="john@example.com"),
        User(id=2, name="Jane", email="jane@example.com")
    ]
    UserPipeline().run(users)

注意，此示例中引用的表j具有id，name和email三列，并且在BigQuery中已经正确

上一篇：ApacheBeam管道Java：记录未按顺序写入目标文件。

下一篇：ApacheBEAM管道消息批处理立即触发而不是在固定时间窗口后触发

ApacheBeam管道无法将数据插入BigQuery。工作流失败。

相关内容

热门资讯