首页 > 解决方案 > 在 Databricks 中分解结构

问题描述

源结构:

 root
 |-- return: struct (nullable = true)
 |    |-- traces: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- site_details: struct (nullable = true)
 |    |    |    |    |-- latitude: string (nullable = true)
 |    |    |    |    |-- longitude: string (nullable = true)
 |    |    |    |    |-- name: string (nullable = true)
 |    |    |    |    |-- org_name: string (nullable = true)
 |    |    |    |    |-- short_name: string (nullable = true)
 |    |    |    |    |-- timezone: string (nullable = true)
 |    |    |    |-- trace: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- q: long (nullable = true)
 |    |    |    |    |    |-- t: long (nullable = true)
 |    |    |    |    |    |-- v: string (nullable = true)

当我将站点详细信息分解为

dataresponse.select("return.traces.site_details").select(F.explode('site_details').alias('data')).select('data.*')

一切都很好,结果如下:

在此处输入图像描述

但有痕迹:当我这样做的时候

dataresponse.select(F.explode("return.traces.trace").alias('data')).select('data.q', 'data.t', 'data.v')

回报:

x:pyspark.sql.dataframe.DataFrame
q:array
   element:long
t:array
   element:long
v:array
   element:string
+--------------------+--------------------+--------------------+
|                   q|                   t|                   v|
+--------------------+--------------------+--------------------+
|[9, 9, 9, 9, 9, 9...|[19730101010000, ...|[1.2316, 1.2316, ...|
+--------------------+--------------------+--------------------+

然后我试图分解数组列,如下所示:

x.withColumn("t", explode("t")).withColumn("q", explode("q")).withColumn("v", explode("v"))

这需要很长时间......有没有更好的方法来做到这一点???

标签: python-3.xdataframestructpyspark

解决方案


最终为每一行创建列表,然后从列表中创建熊猫 df

tracedf = dataresponse.select(F.explode("return.traces.trace").alias('data')).select('data.q', 'data.t', 'data.v')

q = tracedf.collect()[0]['q']
t = tracedf.collect()[0]['t']
v = tracedf.collect()[0]['v']

dataPdbyYear = pd.DataFrame({"q" : q, "t" : t, "v" : v})

推荐阅读