首页 > 解决方案 > 为什么这个嵌套的“何时”在 pyspark 中不起作用?

问题描述

我正在尝试将人们划分为年龄范围

from pyspark import SparkFiles
from pyspark.sql import functions as fn

## Import data

url_users = "https://raw.githubusercontent.com/leanhdung1994/BigData/main/users.csv"
spark.sparkContext.addFile(url_users)
users_from_file = spark.read.csv("file://" + SparkFiles.get("users.csv"), header = True, sep = ",", inferSchema = True)

## Generate column age

reference_date = date(2017, 12, 31)
from pyspark.sql.types import IntegerType
def cal_age(born):
    return reference_date.year - born.year - ((reference_date.month, reference_date.day) < (born.month, born.day))
users_from_file = users_from_file.withColumn('age', cal_age_udf(fn.to_date(fn.col('birth_date'))))

## Generate column range

users_from_file1 = users_from_file.withColumn('range', fn.when(fn.col("age") <= 25, 1)fn.when(fn.col("age") <= 35, 2).fn.otherwise(3))

users_from_file1.show()

然后它返回一个错误

SyntaxError: invalid syntax
  File "<command-2296735704765764>", line 3
    users_from_file1 = users_from_file.withColumn('range', fn.when(fn.col("age") <= 25, 1)fn.when(fn.col("age") <= 35, 2).fn.otherwise(3))
                                                                                           ^
SyntaxError: invalid syntax

您能否详细说明一下这个嵌套when?这个语法When来自这个答案,但它不起作用。

在此处输入图像描述

标签: apache-sparkif-statementpysparkapache-spark-sql

解决方案


它应该是

fn.when(fn.col("age") <= 25, 1).when(fn.col("age") <= 35, 2).otherwise(3)

无需fn在第一个之后再次指定when


推荐阅读