首页 > 解决方案 > 填补缺失的数据

问题描述

我有两个数据框

df_1:

ID    |  title  |  name   |   age
----------------------------------
32    |  AA     | Alex    | 30
----------------------------------
4568  |  BB     |  Dom    |  35
----------------------------------
3804  |  CC     |  pascal |  58
----------------------------------




  df_2:


ID   |  title   
--------------
288  |  AZERTY    
--------------
290  |  querty      
--------------

我想附加 to 的df_2数据df_1。我想right在来自的行前面填充列年龄df_1

df_1 为:

ID    |  title  |  name   |   age
----------------------------------
32    |  AA     | Alex    | 30
----------------------------------
4568  |  BB     |  Dom    |  35
----------------------------------
3804  |  CC     |  pascal |  58
----------------------------------
288  |  AZERTY  | right   | right
-----------------------------------
290  |  querty  | right   | right    
-----------------------------------

如何在pyspark填充缺失的列时附加数据框?

标签: dataframeapache-sparkpysparkapache-spark-sqlpyspark-sql

解决方案


您需要联合表:

df_2 = df_2
  .withColumn("name", lit("right"))
  .withColumn("age", lit("right"))

df_1.union(df_2).show()

+----+------+-------+-------+
|  id| title|   name|    age|
+----+------+-------+-------+
|  32|    AA|   Alex|     30|
|4568|    BB|    Dom|     35|
| 288|AZERTY|right  |right  |
| 290|querty|right  |right  |
+----+------+-------+-------+

推荐阅读