pyspark - How do I select columns from a Spark dataframe when I also need to use withColumnRenamed?
问题描述
I have a dataframe of
df = df.select("employee_id", "employee_name", "employee_address")
I need to rename the first two fields, but also still select the third field. So I thought this would work, but this appears to only select employee_address
.
df = (df.withColumnRenamed("employee_id", "empId")
.withColumnRenamed("employee_name", "empName")
.select("employee_address")
)
How do I properly rename the first two fields while also selecting the third field?
I tried a mix of withColumn
usages, but that doesn't work. Do I have to use a select on all three fields?
解决方案
您可以使用以下alias
命令:
import pyspark.sql.functions as func
df = df.select(
func.col("employee_id").alias("empId"),
func.col("employee_name").alias("empName"),
func.col("employee_address")
)
推荐阅读
- c - 收到警告:变量类型在浮点类型的函数中默认为 int *
- c - 'variable' 可能在此函数中未初始化使用,有一个可行的解决方法,但不明白为什么
- javascript - 逐个索引比较字符串字符并返回带有索引的不匹配字符
- apache-spark - 理想的 Spark 配置
- python-3.x - 如何在多个字典列表中进行迭代
- vue.js - 使用Vue + Chartjs时可以使用自定义数据格式吗
- javascript - Angular Firestore _methodName FieldValuea.serverTimestamp
- python - 从pyqt中的网格布局中删除最后两个小部件
- c++ - Clang 找不到转换运算符
- python - 看来我不能在另一个函数中使用一个函数