dataframe - 为每一行从以数据帧中的子字符串开头的列创建一个具有列表值的列
问题描述
我有一个这样的数据框:
name, surname, delivery_?, delivery_?, delivery_?,other delivery_?, recovery_?,recovery_?, recovery_?, and other recovery_?
我想:
name, surname, delivery, recovery
其中交付列与每行的值[df['delivery_?'],df['delivery_?'],df['delivery_?'],........]
(列表)一样,
和每行的值[df['recovery_?'],df['recovery_?'],df['recovery_?'],....]
(列表)的恢复列。
我知道在 parent_list 我有 ['recovery','delivery', ....]
我用蟒蛇。感谢和问候
解决方案
您可以创建数组列:
import pyspark.sql.functions as F
from functools import reduce
parent_list = ['recovery', 'delivery']
df2 = df.select(
*[F.col(c) for c in df.columns if not reduce(lambda a, b: a or c.startswith(b), parent_list, False)],
*[F.array(*[F.col(c) for c in df.columns if c.startswith(i)]).alias(i) for i in parent_list]
)
推荐阅读
- python - Filter anomalous and complex datasets
- r - Creating a dataframe from vectors for mean, min, and max in r
- flutter - How to upload and update same named file to google drive in Flutter
- vb.net - After converting from csharp to vb when using an online converter
- html - 悬停时边框向上移动文本
- python - 绘制向量的每 n 个序列,直到数据结束 | Python
- c++ - 我想使用此代码单击某些特定窗口,但无法正常工作
- html - 如何使段落的动画居中?
- ios - Xcode 11.5 App Launches than go to black screen device is iOS 13.4.1 iPhone X
- javascript - 如何更新数组列表中单个项目的状态而不更新 ReactJs 中数组中的每个列表