首页 > 解决方案 > 通过为另一个数据框中的标记数据添加列来创建

问题描述

假设我的数据框

df1
name startdate    label
A    14-05-07     1
A    14-05-21     2
A    14-06-01     3
A    14-06-15     4
B    18-10-05     1
B    18-10-19     2
B    18-10-31     3
C    19-03-04     1     
C    19-03-17     2
C    19-03-31     3
D    16-04-01     1
D    16-04-21     2
..
df2
name date     
A    14-04-05
A    14-05-08
A    15-01-05
B    18-07-05
B    18-10-02
C    19-01-03
C    19-03-30
D    16-04-01
..

有两个数据框 df1 和 df2 具有相同的公共键“名称”

我想将标记为 df1 的“开始日期”添加到 df2 的列中。

输出我想要的

name date     label1   label2   label3   label4   ..
A    14-04-05 14-05-07 14-05-21 14-06-01 14-06-15
A    14-05-08 14-05-07 14-05-21 14-06-01 14-06-15
A    15-01-05 14-05-07 14-05-21 14-06-01 14-06-15
B    18-07-05 18-10-05 18-10-19 18-10-31 nan
B    18-10-02 18-10-05 18-10-19 18-10-31 nan
C    19-01-03 19-03-04 19-03-17 19-03-31 nan
C    19-03-30 19-03-04 19-03-17 19-03-31 nan
D    16-04-01 16-04-01 16-04-21 nan      nan

有办法吗?谢谢你的阅读。

标签: pythonpandasnumpylabel

解决方案


DataFrame.joinDataFrame.pivot和一起使用DataFrame.add_prefix

df = df2.join(df1.pivot('name','label','startdate').add_prefix('label'), on='name')
print (df)
  name      date    label1    label2    label3    label4
0    A  14-04-05  14-05-07  14-05-21  14-06-01  14-06-15
1    A  14-05-08  14-05-07  14-05-21  14-06-01  14-06-15
2    A  15-01-05  14-05-07  14-05-21  14-06-01  14-06-15
3    B  18-07-05  18-10-05  18-10-19  18-10-31       NaN
4    B  18-10-02  18-10-05  18-10-19  18-10-31       NaN
5    C  19-01-03  19-03-04  19-03-17  19-03-31       NaN
6    C  19-03-30  19-03-04  19-03-17  19-03-31       NaN
7    D  16-04-01  16-04-01  16-04-21       NaN       NaN

推荐阅读