首页 > 解决方案 > 在 Java 中将 ORC 转换为 JSON

问题描述

我正在尝试在单元测试中将输出 ORC 文件转换为 Java 中的 JSON。我一直在阅读他们的单元测试并受到以下启发:

     PrintStream origOut = System.out;
      String outputFilename = "orc-file-dump.json";
      String tmpFileLocationJson = createTempFileJson();
      FileOutputStream myOut = new FileOutputStream(tmpFileLocationJson);

      // replace stdout and run command
      System.setOut(new PrintStream(myOut, true, StandardCharsets.UTF_8.toString()));
      FileDump.main(new String[]{"data", tmpFileLocationJson});
      System.out.flush();
      System.setOut(origOut);
      System.out.println("done");

像这样的东西。问题是我不太确定如何将此代码等同于 java utils 利用率:

java -jar orc-tools-1.5.5-uber.jar data output-1595448128191.orc例如,输出以下 JSON 转储。

{"integerExample":1,"nestedExample":{"sub1":"value1","sub2":42},"dateExample":"2018-01-04"}

所以我想将 ORC 转换为 JSON,以便在我的单元测试中进行交叉引用。

编辑:这可能是包私有:( https://github.com/apache/orc/blob/b9e82b3d7b473201bdcf46011c3b2fda10ef897f/java/tools/src/java/org/apache/orc/tools/PrintData.java#L227

标签: javahiveorc

解决方案


好的,我从 Hive 中出售了代码并将输出流覆盖到文件写入器,并将输出重定向到文件中以读回测试。

  static void printJsonData(String fileName, PrintStream printStream,
      Reader reader) throws IOException, JSONException, org.codehaus.jettison.json.JSONException {
//    OutputStreamWriter out = new OutputStreamWriter(printStream, "UTF-8");
    BufferedWriter out = new BufferedWriter(new FileWriter(fileName.concat(".json")));
    RecordReader rows = reader.rows();
    try {
      TypeDescription schema = reader.getSchema();
      VectorizedRowBatch batch = schema.createRowBatch();
      while (rows.nextBatch(batch)) {
        for (int r = 0; r < batch.size; ++r) {
          JSONWriter writer = new JSONWriter(out);
          printRow(writer, batch, schema, r);
          out.write("\n");
          out.flush();
          if (printStream.checkError()) {
            throw new IOException("Error encountered when writing to stdout.");
          }
        }
      }
    } finally {
      rows.close();
    }
  }

推荐阅读