首页 > 解决方案 > Pandas中电话号码出现日期之间的差异

问题描述

我有两个 csv 文件:

csv1:

Mobile_Number    Date    

503477334    2018-10-12
506002884    2018-10-12
501022162    2018-10-12
503487338    2018-10-13
506012887    2018-10-13
503427339    2018-10-14

csv2:

   Date       Mobile_Number

2018-10-01     503477334
2018-10-06     501022162
2018-10-08     506002884
2018-10-09     503487338
2018-10-13     506012887
2018-10-14     503427492

现在。我想要一个如下所示的输出,如果数字存在于 csv2 中,则有一个新列指示 csv1 中数字的出现日期之间的差异

csv1:

Mobile_Number    Date     Difference

503477334    2018-10-12     11
506002884    2018-10-12     4
501022162    2018-10-12     6
503487338    2018-10-13     4
506012887    2018-10-13     0
503427339    2018-10-14     NaN

标签: pandas

解决方案


使用Series.mapDate列减法,然后将 timedeltas 转换为Series.dt.days

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

s1 = df2.drop_duplicates('Mobile_Number').set_index('Mobile_Number')['Date']
df1['Difference'] = df1['Date'].sub(df1['Mobile_Number'].map(s1)).dt.days
print (df1)
   Mobile_Number       Date  Difference
0      503477334 2018-10-12        11.0
1      506002884 2018-10-12         4.0
2      501022162 2018-10-12         6.0
3      503487338 2018-10-13         4.0
4      506012887 2018-10-13         0.0
5      503427339 2018-10-14         NaN

推荐阅读