首页 > 解决方案 > DataflowRunner 无法执行

问题描述

使用 Apache Beam Go SDK 执行处理时,“Dataflow”和“DataflowRunner”不适用于“--runner”选项集。

我查看了官方的 Apache Beam 参考和 Go 手册,但我不明白原因。你能告诉我一个具体的解决方案吗?

在命令行上执行的命令
go run test.go --runner=DataflowRunner \
          --project=[my-project] \
          --region=[my-region] \
          --temp_location=gs://[my-gs-bucket] \
          --staging_location=gs://[my-gs-bucket] \
          --worker_harness_container_image=apache/beam_go_sdk:lates
实际代码
    package main

    import (
        "context"
        "flag"
        "reflect"

        "github.com/bramp/morebeam/csvio"
        "github.com/apache/beam/sdks/go/pkg/beam"
        "github.com/apache/beam/sdks/go/pkg/beam/log"
        "github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
        "github.com/apache/beam/sdks/go/pkg/beam/io/bigqueryio"
        "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts"
    )

    type Csvdata struct {
        id   string
        name string
        age  string
    }

    var (
        BQ = "[my-project]:[my-data-set].[my-file]"
    )

    func main() {

        flag.Parse()

        beam.Init()

        ctx := context.Background()
            
        p := beam.NewPipeline()
        s := p.Root()
        
     log.Infof(ctx, "Started pipeline on scope: %s", s)
        
     parse := csvio.Read(s, "gs://[my-gcs-bucket]", reflect.TypeOf(Csvdata{}))

        if BQ != "" {
            project := gcpopts.GetProject(ctx)
            bigqueryio.Write(s, project, BQ, parse)
        }
        
        beamx.Run(ctx, p)
    }
执行结果
2021/06/17 01:29:15 Started pipeline on scope: root

标签: gogoogle-cloud-dataflowapache-beam

解决方案


我相信它应该是 --runner=dataflow .. 小写


推荐阅读