首页 > 解决方案 > 从 Google Cloud Dataflow 内部写入 Firestore

问题描述

我现在遇到的核心问题是,当我运行部署到 Google Cloud Dataflow 的 Dataflow 管道时,出现错误:

java.lang.IllegalStateException:名称为 [DEFAULT] 的 FirebaseApp 不存在。

如果我在本地运行相同的管道,则一切正常。所以我怀疑是身份验证问题或环境问题。

代码位:

DEPLOY 和 REAL 变量用于控制是否推送到云端(或在本地运行)以及是否使用我的 Pub/Sub 源或使用 moc'd 数据。在 moc'd 和 pub/sub 数据之间切换似乎根本不会影响 Firestore 的情况。只有部署与否。

我正在初始化 Firestore 应用程序的 main() 部分:

    public class BreakingDataTransactions {

    // When true, this pulls from the specified Pub/Sub topic
  static Boolean REAL = true;
    // when set to true the job gets deployed to Cloud Dataflow
  static Boolean DEPLOY = true;

  public static void main(String[] args) {
      // validate our env vars
    if (GlobalVars.projectId   == null ||
        GlobalVars.pubsubTopic == null ||
        GlobalVars.gcsBucket   == null ||
        GlobalVars.region      == null) {
          System.out.println("You have to set environment variables for project (BREAKING_PROJECT), pubsub topic (BREAKING_PUBSUB), region (BREAKING_REGION) and Cloud Storage bucket for staging (BREAKING_DATAFLOW_BUCKET) in order to deploy this pipeline.");
          System.exit(1);
        }

      // Initialize our Firestore instance
    try {
    GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
    System.out.println("*************************");
    System.out.println(credentials);
    FirebaseOptions firebaseOptions =
        new FirebaseOptions.Builder()
            .setCredentials(credentials)
            .setProjectId(GlobalVars.projectId)
            .build();
    FirebaseApp firebaseApp = FirebaseApp.initializeApp(firebaseOptions);

    } catch (IOException e) {
      e.printStackTrace();
    }

      // Start dataflow pipeline
    DataflowPipelineOptions options =
        PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);

    options.setProject(GlobalVars.projectId);

    if (DEPLOY) {
        options.setRunner(DataflowRunner.class);
        options.setTempLocation(GlobalVars.gcsBucket);
        options.setRegion(GlobalVars.region);
    }

    Pipeline p = Pipeline.create(options);

还有我正在处理的部分:

    PCollection<Data> dataCollection =
        jsonStrings
            .apply(ParDo.of(JSONToPOJO.create(Data.class)))
            .setCoder(AvroCoder.of(Data.class));

    PCollection<Result> result =
        dataCollection
            .apply(Window.into(FixedWindows.of(Duration.standardSeconds(1))))
            .apply(WithKeys.of(x -> x.operation + "-" + x.job_id))
            .setCoder(KvCoder.of(StringUtf8Coder.of(), AvroCoder.of(Data.class)))
            .apply(Combine.<String, Data, Result>perKey(new DataAnalysis()))
            .apply(Reify.windowsInValue())
            .apply(MapElements.into(TypeDescriptor.of(Result.class))
                    .<KV<String, ValueInSingleWindow<Result>>>via(
                        x -> {
                          Result r = new Result();
                          String key = x.getKey();
                          r.query_action = key.substring(0, key.indexOf("-"));
                          r.job_id = key.substring(key.indexOf("-") + 1);
                          r.average_latency = x.getValue().getValue().average_latency;
                          r.failure_percent = x.getValue().getValue().failure_percent;
                          r.timestamp = x.getValue().getTimestamp().getMillis();
                          return r;
                        }));

          // this node will (hopefully) actually write out to Firestore
        result.apply(ParDo.of(new FireStoreOutput()));

最后,FireStoreOutput 类:

  public static class FireStoreOutput extends DoFn<Result, String> {

    Firestore db;

    @ProcessElement
    public void processElement(@Element Result result) {

      db = FirestoreClient.getFirestore();
      DocumentReference docRef = db.collection("events")
                                   .document("next2020")
                                   .collection("transactions")
                                   .document(result.job_id)
                                   .collection("transactions")
                                   .document();
      //System.out.println(docRef.getId());
      // Add document data  with id "alovelace" using a hashmap
      Map<String, Object> data = new HashMap<>();
      data.put("failure_percent", result.failure_percent);
      data.put("average_latency", result.average_latency);
      data.put("query_action", result.query_action);
      data.put("timestamp", result.timestamp);

      // asynchronously write data
      ApiFuture<WriteResult> writeResult = docRef.set(data);
      try {
        writeResult.get();
      } catch (InterruptedException e) {
        e.printStackTrace();
      } catch (ExecutionException e) {
        e.printStackTrace();
      }
      ;
    }
  }

错误发生在该行:db = FirestoreClient.getFirestore();

我正在使用标志部署 Dataflow 作业,该--serviceAccount标志指定有权执行所有操作的服务帐户。

因此,除非GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();某种方式不起作用(但您在那里看到打印语句,并且它确实正确打印出构建时的凭据),否则不是这样。

但是,这只发生在构建时......所以我想知道我是否有持久性问题,它在构建时初始化很好,但是当作业实际在云中运行时,它会丢失部署和处理。如果是这样,我该如何解决这个问题?

谢谢!

标签: javagoogle-cloud-firestoregoogle-cloud-dataflowapache-beam

解决方案


好的,我找到了解决方案……最大的问题是我的 DAG 的 PCollection 被分成了两个线程路径。我有两种类型的操作“读取”和“写入”,因此这些结果都将 PCollection 发送到我的 FirestoreOut 类,这是我试图初始化 Firestore 应用程序的地方,导致已经初始化的问题。

但是,使我的 db 对象成为一个同步的静态对象,并建立一个同步的 getDB() 方法,仅当它尚未设置时才进行初始化。FireStoreOut 部分的最终更新相关代码:

  public static class FireStoreOutput extends DoFn<Result, String> {

    static Firestore db;

    public static synchronized Firestore getDB() {
      if (db == null) {
        System.out.println("I'm being called");
          // Initialize our Firestore instance
        try {
          GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
          System.out.println("*************************");
          System.out.println(credentials);
          FirebaseOptions firebaseOptions =
              new FirebaseOptions.Builder()
                  .setCredentials(credentials)
                  .setProjectId(GlobalVars.projectId)
                  .build();
          FirebaseApp firebaseApp = FirebaseApp.initializeApp(firebaseOptions);

        } catch (IOException e) {
          e.printStackTrace();
        }
        db = FirestoreClient.getFirestore();
      }
      return db;
    }

    @ProcessElement
    public void processElement(@Element Result result) {
      DocumentReference docRef = getDB().collection("events")
                                   .document("next2020")
                                   .collection("transactions")
                                   .document(result.job_id)
                                   .collection("transactions")
                                   .document();

推荐阅读