首页 > 解决方案 > 如何获取数据框中列的架构(不是所有架构)?

问题描述

展平操作后我有一个数据框。

我想返回原始数据框。

例如: Df:

 |-- delivery: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- load_delivery_intervals: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- from_time: string (nullable = true)
 |    |    |    |    |-- to_time: string (nullable = true)
 |    |    |-- delivery_start_date_time: string (nullable = true)
 |    |    |-- delivery_end_date_time: string (nullable = true)
 |    |    |-- duration: string (nullable = true)
 |    |    |-- week_days: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |    |    |-- delivery_capacity_quantity: string (nullable = true)
 |    |    |-- quantity_unit: string (nullable = true)

我有一个数据框(展平),例如: flat_df_new:

delivery_from_time: string (nullable = true)
 delivery_to_time: string (nullable = true)
 delivery_delivery_start_date_time: string (nullable = true)
 delivery_delivery_end_date_time: string (nullable = true)
 delivery_duration: string (nullable = true)
 delivery_delivery_capacity_quantity: string (nullable = true)
 delivery_quantity_unit: string (nullable = true)

flat_df_new 是展平数据框(分解所有结构类型)并对其进行操作。

parentList 是在 df original 中分解的数组结构列表。

    for parent in parentList: 
        df_temp=df.select(parent).schema <--get struct Type schema 
        flat_df_new=flat_df_new.withColumn(parent,....) <--- here now i want add a column named as parent variable but with schema as df_temp and value as column in flat_df_new.

感谢和问候

标签: pythondataframeapache-sparkpyspark

解决方案


推荐阅读