首页 > 解决方案 > 如何对数据框的每一行进行排序?

问题描述

我有一列元素是列表。如何按字母顺序对该列表进行排序?

col
["R Programming Language", "Computer Programming"]
["R Programming Language", "Working Under Pressure"]
["Master Data Management", "Entity Relationship Models"]
["Master Data Management", "Statistical Analysis Software"]

输出:

col_order
["Computer Programming","R Programming Language"]
["R Programming Language", "Working Under Pressure"]
[ "Entity Relationship Models","Master Data Management"]
["Master Data Management", "Statistical Analysis Software"]

标签: pyspark

解决方案


利用array_sort

from pyspark.sql import functions as F

df.withColumn('col', F.array_sort('col')).show(10, False)

# Output
# +-------------------------------------------------------+
# |col                                                    |
# +-------------------------------------------------------+
# |[Computer Programming, R Programming Language]         |
# |[R Programming Language, Working Under Pressure]       |
# |[Entity Relationship Models, Master Data Management]   |
# |[Master Data Management, Statistical Analysis Software]|
# +-------------------------------------------------------+

推荐阅读