python - Python Pandas 对多个条件求和
问题描述
以下是我的示例数据:
Customer Document Date Clearing Date Invoice_Amount
0 A 09/13/2016 11/04/2016 2,007,324
1 A 04/18/2016 07/11/2016 631,714
2 A 09/13/2016 09/16/2016 4,000,000
3 A 07/11/2017 09/23/2017 5,000,000
4 A 05/03/2016 06/17/2016 2,000,000
--- --- --- --- ---
1158 H 04/21/2017 06/28/2017 3,000,000
1159 H 04/25/2017 05/19/2017 1,000,000
1160 H 11/03/2017 12/11/2017 4,500,000
1161 H 03/15/2018 05/27/2018 3,500,000
1162 H 02/21/2018 05/03/2018 1,500,000
我想创建一个新变量(在 Invoice_Amount 之后添加一个新列)No_Paid,它计算“在客户新发票的文档日期之前支付的发票数量”。
预期的输出如下...
Customer Document Date Clearing Date Invoice_Amount No_Paid*
0 A 09/13/2016 11/04/2016 2,007,324 8
1 A 04/18/2016 07/11/2016 631,714 1
2 A 09/13/2016 09/16/2016 4,000,000 8
3 A 07/11/2017 09/23/2017 5,000,000 6
4 A 05/03/2016 06/17/2016 2,000,000 1
--- --- --- --- --- ---
1158 H 04/21/2017 06/28/2017 3,000,000 5
1159 H 04/25/2017 05/19/2017 1,000,000 3
1160 H 11/03/2017 12/11/2017 4,500,000 7
1161 H 03/15/2018 05/27/2018 3,500,000 37
1162 H 02/21/2018 05/03/2018 1,500,000 37
目前,我使用 for 循环来实现预期的输出
import pandas as pd
df = pd.read_csv('E:\data.csv')
df['Document Date'] = pd.to_datetime(df['Document Date'],format="%m/%d/%Y")
df['Clearing Date'] = pd.to_datetime(df['Clearing Date'],format="%m/%d/%Y")
df["No_Paid"] = ""
for i in df.index:
Vendor= df.loc[i,"Vendor"]
Doc_Date= df.loc[i,"Document Date"]
Six_Month = Doc_Date - pd.Timedelta(days=180)
df.loc[i,"No_Paid"] = df.loc[(df["Vendor"] == Vendor) & (df["Clearing Date"] < Doc_Date) & (df["Document Date"] >= Six_Month),"Invoice_Amount"].count()
在实际情况下,我有超过 100,000 个发票数据,这需要更长的时间我尝试使用 df.apply ...但无法达到相同的输出...
解决方案
以你的例子为例:
import pandas as pd
# read in csv (save as csv or read in using pd.read_excel)
df = pd.read_csv('file.csv')
# to datetime just in case
df['Doc_Date'] = pd.to_datetime(df['Doc_Date'])
df['Exp_Date'] = pd.to_datetime(df['Exp_Date'])
df['Overdue'] = df['Doc_Date'] - df['Exp_Date']
# 180 days for 6 months
df['6M_Age'] = df['Doc_Date'] - pd.Timedelta(days=180)
# Hard to tell what the line in the middle of the data means
# you can group by two columns if you need too
df['Sum_of_paid'] = df.groupby('ID').cumsum()
推荐阅读
- oracle-adf - [ADF]如何根据列宽调整输入框宽度
- c# - ASP.NET Core 2.1 错误:无法使用类 Controller、属性 ViewData
- java - 使用java for循环将多个数据插入数据库
- elasticsearch - 我可以定义要快照的分片吗?
- python-3.x - 通过硒单击“更多”按钮
- javascript - 交互式多边形形状
- javascript - 按钮需要点击 2 次才能工作。- 香草 JavaScript
- dart - 如何在 Flutter 的 AppBar 中制作相同的颜色
- javascript - 尝试将多个 ID 添加到 twitter 推文功能
- c - 把字符变成数字并在c中排列数字