首页 > 解决方案 > scala将深引用数组转换为数组>

问题描述

我在转换数组时遇到问题。这是我需要转换为的引用数组的示例array<array<array<array<double>>

"[[[[1962.717452,24087.495431], [1965.762332,24083.308721], [1966.00701,24084.923595], [1962.717452,24087.495431]]], [[[1741.375311,24038.621419], [1729.204453,24031.742238], [1747.989907,24001.05051], [1754.33992,24000.256759], [1783.973313,24013.221368], [1796.673338,23990.996324], [1801.171264,23991.260907], [1800.642096,23987.821317], [1801.435848,23985.175479], [1814.929625,23984.117143], [1819.162966,23984.381727], [1821.808805,23985.704646], [1825.777563,23988.350485], [1838.477588,23987.027566], [1841.652595,23985.440062], [1842.975514,23982.52964], [1838.742172,23980.148385], [1865.729726,23933.317042], [1865.99431,23925.908693], [1860.702633,23922.733687], [1876.048497,23892.041959], [1906.572207,23906.954344], [1936.806089,23900.264293], [1939.638088,23916.46942], [1940.2643,23920.3983], [1940.326,23920.7851], [1940.1536,23920.8426], [1940.3881,23921.7799], [1940.3963,23921.8129], [1940.4914,23921.781], [1940.4994,23921.8129], [1940.6848,23922.7523], [1940.4753,23922.8139], [1940.5777,23923.7923], [1940.5819,23923.8295], [1940.8089,23923.7934], [1940.8169,23923.8295], [1942.93661,23937.050293], [1947.802922,23968.916791], [1947.6793,23969.5233], [1947.2135,23970.4385], [1945.6178,23971.8569], [1945.3416,23971.9558], [1946.492809,23977.373796], [1944.04655,23977.502547], [1917.323579,23961.098347], [1890.336025,24002.108846], [1950.661146,24026.450561], [1951.719482,24033.594325], [1958.069494,24032.53599], [1965.517018,24081.689648], [1937.718445,24096.692053], [1858.585962,24098.946539], [1751.429498,24092.331943], [1739.523224,24064.02147], [1732.114876,24060.052712], [1741.375311,24038.621419]]]]"

有什么好的解决方案吗?

标签: scalaapache-sparkapache-spark-sql

解决方案


您可以使用它from_json来解析:

val df2 = df.select(
    from_json(
        col("col_name"), 
        lit("array<array<array<array<double>>>>")
    ).as("new_col_name")
)

df2.printSchema
root
 |-- new_col_name: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: array (containsNull = true)
 |    |    |    |-- element: array (containsNull = true)
 |    |    |    |    |-- element: double (containsNull = true)

对于您的意见:

val df2 = df.withColumn(
    "coordinates", 
    when(
        col("geometry.type") === "Polygon",
        from_json(
            col("geometry.coordinates").cast("string"),
            lit("array<array<array<double>>>")
        )
    ).when(
        col("geometry.type") === "MultiPolygon", 
        from_json(
            col("geometry.coordinates").cast("string"),
            lit("array<array<array<array<double>>>>")
        )
    ).when(
        col("geometry.type") === "Point", 
        from_json(
            col("geometry.coordinates").cast("string"),
            lit("array<double>")
        )
    )
)

推荐阅读