首页 > 解决方案 > How to assign a colum value as a addition of column value and a constant in pyspark?

问题描述

I need to create a column called Sea freight days + Bufferand the values are assigned as final_df6['No of days take if sea freight']+destuff_buffer if the Mode is equal to AIR.Here destuff buffer is a constant.

destuff_buffer = 4
final_df6 = final_df6.withColumn('Sea freight days + Buffer',
    when(col("Mode")=='AIR',final_df6['No of days take if sea freight']+destuff_buffer).otherwise(np.nan)
)

But I am getting following error.

     Traceback (most recent call last): File
"/opt/amazon/bin/runscript.py", line 
67, in <module> runpy.run_path(script, run_name='__main__') File "/usr/lib64/python3.7/runpy.py", line 261, in run_path
code, fname = _get_code_from_file(run_name, path_name) File
"/usr/lib64/python3.7/runpy.py", line 236, in _get_code_from_file
code = compile(f.read(), fname, 'exec') File "/tmp/ACOEtest",
line 165 destuff_buffer = 4 ^ SyntaxError: invalid syntax 
During handling of the above exception, another exception
occurred: Traceback (most recent call last): File "/opt/amazon/bin/runscript.py", line 100, in <module> while"runpy.py" in new_stack.tb_frame.f_code.co_filename:
AttributeError: 'NoneType' object has no attribute 'tb_frame'

标签: pythonpysparkconditional-statements

解决方案


c = 40
df = spark.createDataFrame(spark.sparkContext.parallelize([('AIR',1),('NONAIR',5)]),['mode','d'])
df = df.withColumn('mycol', when(df.mode=='AIR', df.d+c).otherwise(None))
df.show()

+------+---+-----+
|  mode|  d|mycol|
+------+---+-----+
|   AIR|  1|   41|
|NONAIR|  5| null|
+------+---+-----+

推荐阅读