首页 > 解决方案 > How do I select columns from a Spark dataframe when I also need to use withColumnRenamed?

问题描述

I have a dataframe of

df = df.select("employee_id", "employee_name", "employee_address")

I need to rename the first two fields, but also still select the third field. So I thought this would work, but this appears to only select employee_address.

df = (df.withColumnRenamed("employee_id", "empId")
        .withColumnRenamed("employee_name", "empName")
        .select("employee_address")
)

How do I properly rename the first two fields while also selecting the third field?

I tried a mix of withColumn usages, but that doesn't work. Do I have to use a select on all three fields?

标签: pysparkpyspark-sql

解决方案


您可以使用以下alias命令:

import pyspark.sql.functions as func

df = df.select(
    func.col("employee_id").alias("empId"), 
    func.col("employee_name").alias("empName"), 
    func.col("employee_address")
)

推荐阅读