首页 > 解决方案 > 如何大写所有pyspark数据框条目(列名保持相似)

问题描述

这是我的数据集

lastvalue_month

DataFrame[msisdn: string, year: string, month: string, day: string, date_id: string, province: string, district: string, sub_district: string, handset_brand: string, handset_os: string, handset_type: string, payment_category: string, sub_status: string, hpos_to_ios: string, hpos_from_ios: string, hptype_to_smart: string, hptype_from_smart: string, hpbrand_change: string]`

这是我的代码

from pyspark.sql import functions as F
lastvalue_month = [ F.upper(F.col(x)).alias(x) for x in lastvalue_month.columns ]

这是输出

Column<b'upper(msisdn) AS `msisdn`'>,
 Column<b'upper(year) AS `year`'>,
 Column<b'upper(month) AS `month`'>,
 Column<b'upper(day) AS `day`'>,
 Column<b'upper(date_id) AS `date_id`'>,
 Column<b'upper(province) AS `province`'>,
 Column<b'upper(district) AS `district`'>,
 Column<b'upper(sub_district) AS `sub_district`'>,
 Column<b'upper(handset_brand) AS `handset_brand`'>,
 Column<b'upper(handset_os) AS `handset_os`'>,
 Column<b'upper(handset_type) AS `handset_type`'>,
 Column<b'upper(payment_category) AS `payment_category`'>,
 Column<b'upper(sub_status) AS `sub_status`'>,
 Column<b'upper(hpos_to_ios) AS `hpos_to_ios`'>,
 Column<b'upper(hpos_from_ios) AS `hpos_from_ios`'>,
 Column<b'upper(hptype_to_smart) AS `hptype_to_smart`'>,
 Column<b'upper(hptype_from_smart) AS `hptype_from_smart`'>,
 Column<b'upper(hpbrand_change) AS `hpbrand_change`'>]

我想要的是大写所有具有列名的pyspark数据框条目保持相似。如何在 Pyspark 中做到这一点?

标签: pythonpandasdataframepyspark

解决方案


我认为你可以这样withColumn

for x in lastvalue_month.columns:
    lastvalue_month = lastvalue_month.withColumn(x, F.upper(F.col(x)))

或者也许与select

lastvalue_month = \
   lastvalue_month.select(*[F.upper(F.col(x)).alias(x) 
                            for x in lastvalue_month.columns ])

推荐阅读