Apache Beam的流水线步骤不并行运行吗？（Python）_编程开发

Apache Beam的流水线步骤不并行运行吗？（Python）

创始人

2024-09-03 14:03:25

0次

在Apache Beam中，流水线步骤默认是并行运行的。但是，有时您可能希望某些步骤按顺序运行，而不是并行运行。以下是一个示例，演示如何在Apache Beam中实现顺序运行的流水线步骤：

import apache_beam as beam

def step1(element):
    # 第一个步骤的逻辑
    return element

def step2(element):
    # 第二个步骤的逻辑
    return element

def step3(element):
    # 第三个步骤的逻辑
    return element

def main():
    with beam.Pipeline() as pipeline:
        # 创建一个PCollection
        input_collection = pipeline | "Create input" >> beam.Create([1, 2, 3, 4, 5])

        # 在步骤1之后应用步骤2
        step1_output = input_collection | "Step 1" >> beam.Map(step1)
        step2_output = step1_output | "Step 2" >> beam.Map(step2)

        # 在步骤2之后应用步骤3
        step3_output = step2_output | "Step 3" >> beam.Map(step3)

        # 打印输出结果
        step3_output | "Print output" >> beam.Map(print)

if __name__ == '__main__':
    main()

在上面的示例中，步骤2在步骤1之后应用，步骤3在步骤2之后应用。这样，步骤2和步骤3将按顺序运行，而不是并行运行。

上一篇：Apache Beam的IllegalArgumentException：不安全的触发器可能会丢失数据。

下一篇：Apache Beam的python模块`fileio.WriteToFiles`中的`oversharding`可以翻译为“超分片”。

Apache Beam的流水线步骤不并行运行吗？（Python）

相关内容

热门资讯