首页 > 解决方案 > 为每一行从以数据帧中的子字符串开头的列创建一个具有列表值的列

问题描述

我有一个这样的数据框:

name, surname, delivery_?, delivery_?, delivery_?,other delivery_?, recovery_?,recovery_?, recovery_?, and other recovery_?

我想:

name, surname, delivery, recovery

其中交付列与每行的值[df['delivery_?'],df['delivery_?'],df['delivery_?'],........](列表)一样,

和每行的值[df['recovery_?'],df['recovery_?'],df['recovery_?'],....](列表)的恢复列。

我知道在 parent_list 我有 ['recovery','delivery', ....]

我用蟒蛇。感谢和问候

标签: dataframepysparkuser-defined-functionsaggregation

解决方案


您可以创建数组列:

import pyspark.sql.functions as F
from functools import reduce

parent_list = ['recovery', 'delivery']

df2 = df.select(
    *[F.col(c) for c in df.columns if not reduce(lambda a, b: a or c.startswith(b), parent_list, False)],
    *[F.array(*[F.col(c) for c in df.columns if c.startswith(i)]).alias(i) for i in parent_list]
)

推荐阅读