python - 从另一个数据框中的一个数据框中搜索值并在相应的行/不同的列中返回信息
问题描述
我有 2 个数据框:
df_Billed: pd.Dataframe({'Bill_Number':[220119, 220120, 220219, 220219, 220419, 220519, 220619, 221219],'Date': [1/31/2019, 2/20/2020, 2/28/2019, 6/30/2019,6/30/2019,6/30/2019,6/30/2019,12/31/2019], 'Amount': [3312.5, 832.0,10000.0, -3312.5,8725.0,1862.5,3637.5,1587.5]})
df_Received: pd.Dataframe({'Bill_Number':[220119, 220219, 220419, 220519, 220619],'Date':[4/16/2019,5/21/2019,8/2/2019,8/2/2019,8/2/2019],'Amount':[3312.5,6687.5,8725,1862.5,3637.5]})
我正在尝试在 df_Billed 中搜索每个“Bill_Number”,以查看是否存在 df_Received。理想情况下,如果存在,我想计算该特定账单号的 df_Billed 和 df_Received 日期之间的差异(以查看获得付款所需的天数)。如果 df_Received 中不存在帐单编号,我只想在 df_Billed 中返回该帐单编号的所有行。
EX: Since df_Billed Bill_Number 220119 is in df_Received, it would return 75 (which is the number of days it took for the bill to be paid 4/16/2019 - 1/31/2019).
EX: Since df_Billed Bill_Number 221219 is not in df_Received, it would return 12/31/2019 (which is the date it was billed).
解决方案
您最初可能会在 Bill_Number 上使用合并
df_Billed=df_Billed.merge(df_Received,on='Bill_Number',how='left')
然后使用apply和pandas.to_datetime计算日期之间的差异
df_Billed['result']=df_Billed.apply(lambda x:x.Date_x if pd.isnull(x.Date_y)
else abs(pd.to_datetime(x.Date_x)-pd.to_datetime(x.Date_y)).days,
axis=1)
最后,我认为您想为最终结果创建一个新列。所以我将合并的列 Date_x 和 Amount_y 重命名为 Date 和 Amount 下面:
df_Billed.drop(['Date_y','Amount_y'],axis=1,inplace=True)
df_Billed.rename(columns={"Date_x": "Date","Amount_x":"Amount"},inplace=True)
最终数据框:
推荐阅读
- r - 计算列表中元素的数量,然后将计数作为R中的列表
- python - ImportError while running tests on Python
- ruby - What are the options for RUBYOPT env?
- c++ - Chromatic number - Groetzsch graph
- javascript - Set class of existing object
- python - 错误:无法将 expanduser('~') 添加到目录。'xxx':'xxx'
- azure - 为 SourceVersion 指定的值不是有效的提交 ID
- r - How to get around a limit in an R function?
- ios - MacOS中IOBluetooth框架中的L2CapChannel能否用iOS中Core蓝牙框架中的L2capChannel进行评论
- javascript - 无需订阅即可获取 ngrx 选择器的当前值