pyspark - 如何对数据框的每一行进行排序?
问题描述
我有一列元素是列表。如何按字母顺序对该列表进行排序?
col
["R Programming Language", "Computer Programming"]
["R Programming Language", "Working Under Pressure"]
["Master Data Management", "Entity Relationship Models"]
["Master Data Management", "Statistical Analysis Software"]
输出:
col_order
["Computer Programming","R Programming Language"]
["R Programming Language", "Working Under Pressure"]
[ "Entity Relationship Models","Master Data Management"]
["Master Data Management", "Statistical Analysis Software"]
解决方案
from pyspark.sql import functions as F
df.withColumn('col', F.array_sort('col')).show(10, False)
# Output
# +-------------------------------------------------------+
# |col |
# +-------------------------------------------------------+
# |[Computer Programming, R Programming Language] |
# |[R Programming Language, Working Under Pressure] |
# |[Entity Relationship Models, Master Data Management] |
# |[Master Data Management, Statistical Analysis Software]|
# +-------------------------------------------------------+
推荐阅读
- r - 如何在 igraph 中为边缘和顶点着色?
- firebase - 有没有一种用于firebase函数的方法,onUpdate触发器,使用它我可以获得关于哪个字段更新而不是完整数据的信息?
- json - 打字稿联合类型字符串不是从 JSON 加载的
- python - 如何将带有元组的列表列表转换为字典列表?
- python-2.7 - 如何计算字典中键的数量
- ios - UiTests - Xcode 停止显示失败的测试结果
- powerbi - 如何为给定的结束日期创建“SUMX”值的度量?
- scala - 如何在 Scala 中的数据帧上将代码从 For 循环更改为 FoldLeft、Fold 或 FoldRight
- javascript - 如何防止光标跳转到固定列文本区域中的下一行
- python - 如何在不循环的情况下在 2d numpy 数组中设置随机元素?