首页 > 解决方案 > Spark 以 Json 数组的形式写入 Json

问题描述

我有一个DataFrame经过大量操作后的样子:

+--------------------+
|              values|
+--------------------+
|[U5, -1.11115, 1,...|
|[U5, 7458.62418, ...|
|[U5, 171.61934, 1...|
|[U5, 221192.9, 1,...|
|[U5, 1842.27947, ...|
|[U5, 17842.82242,...|
|[U5, 2416.94825, ...|
|[U5, 616.19426, 1...|
|[U5, 1813.14912, ...|
|[U5, 18119.81628,...|
|[U5, 17923.19866,...|
|[U5, 46353.87881,...|
|[U5, 7844.85114, ...|
|[U5, -1.11115, 1,...|
|[U5, -1.11115, 1,...|
|[U5, -1.12131, 1,...|
|[U5, 3981.14464, ...|
|[U5, 439.417, 1, ...|
|[U5, 6966.99999, ...|
+--------------------+

当我将它写入 JSON 文件时,它看起来像:

{"values":["U5","-1.11115","1","257346.7","1","1","1","-1.11115","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","326.3316","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","-1.11115","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","326.3316","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","326.3316","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373431","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
{"values":["U5","7458.62418","1","257346.7","1","1","1","7458.62418","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","46511.38222","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","7458.62418","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","46511.38222","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","46511.38222","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373441","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
{"values":["U5","171.61934","1","257346.7","1","1","1","171.61934","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","361193.3137","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","171.61934","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","361193.3137","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","361193.3137","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373453","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
{"values":["U5","221192.9","1","257346.7","1","1","1","221192.9","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","221192.9","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","419152.8592","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","221192.9","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","221192.9","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","419152.8592","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","419152.8592","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373461","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]}
...

中是否有任何可能的操作DataFrame,之后当我们写入 JSON 时,它将如下所示:

{
    "values": [
["U5","-1.11115","1","257346.7","1","1","1","-1.11115","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","326.3316","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","-1.11115","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","-1.11115","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","326.3316","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","326.3316","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373431","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"],
["U5","7458.62418","1","257346.7","1","1","1","7458.62418","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","46511.38222","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","7458.62418","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","7458.62418","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","46511.38222","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","46511.38222","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373441","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"],
["U5","171.61934","1","257346.7","1","1","1","171.61934","343892.72","613295.17","613294.6343","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","343892.72","1","1","361193.3137","343892.72","1","1","1","1","1","1","1","1","1","1","1","1","1","1","171.61934","257346.7","458949.7","458949.2546","1","1","1","1","1","1","1","1","1","1","1","171.61934","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","458949.7","458949.2546","1","1","1","361193.3137","257346.7","4812798.18","13454298.34","1","1","1","1","1","1","1","1","1","1","1","361193.3137","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","257346.7","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","247","1","373453","668","1","TRAD_SPECTRM","1","0","0","0","0","0","0"]
        ...
    ]
}

标签: scalaapache-sparkapache-spark-sql

解决方案


试试这个-

val df = spark.sql("select values from values array('U5', '-1.11115'), array('U6', '-1.11115') T(values)")
    df.show(false)
    df.printSchema()

    df.agg(collect_list("values").as("values"))
      .write
      .mode(SaveMode.Overwrite)
      .json("/path")

    /**
      * file written-
      * {"values":[["U5","-1.11115"],["U6","-1.11115"]]}
      */

推荐阅读