python - 合并具有多列条件和比较的数据框
问题描述
不确定这是做我想做的最好或正确的方法。
我有以下df:
df = pd.DataFrame(np.array([['1-1-2020', '123','How can I Help?', 'Delivered'], ['1-1-2020', '123','How can I Help?', 'Opened'], ['1-2-2021', '100','New Offer', 'Delivered'],['1-2-2021', '100','New Offer', 'Delivered'],['1-4-2021', '144','Last chance, buy now!', 'Delivered'],['1-4-2021', '144','Last chance, buy now!', 'Delivered'],['2-4-2021', '144','Last chance, buy now!', 'Opened']]),
columns=['Date', 'Customer_ID','Subject', 'Status'])
Date Customer_ID Subject Status
0 1-1-2020 123 How can I Help? Delivered
1 1-1-2020 123 How can I Help? Opened
2 1-2-2021 100 New Offer Delivered
3 1-2-2021 100 New Offer Delivered
4 1-4-2021 144 Last chance, buy now! Delivered
5 1-4-2021 144 Last chance, buy now! Delivered
6 2-4-2021 144 Last chance, buy now! Opened
在这个 df 客户 123 收到了一封电子邮件,然后它在第二行被打开。客户 100 发送了两次电子邮件,客户 144 发送了两次电子邮件,其中一个打开了。
我正在尝试使用最后操作日期跟踪每位客户的每封电子邮件的已发送和打开状态。
因此,我创建了两个数据框:一个用于交付,一个用于打开,并将它们合并到交付的一个上以跟踪打开的内容。
df_del = df.loc[(df['Status'] == 'Delivered')]
df_open = df.loc[(df['Status'] == 'Opened')]
d = df_del.rename(columns={'Date': 'Date Delivered'})
o = df_open.rename(columns={'Date': 'last action date', 'Status': 'Open Status'})
w = d.merge(o, on=['Customer_ID','Subject'], how='left')
w
由此可见:
Date Delivered Customer_ID Subject Status last action date Open Status
0 1-1-2020 123 How can I Help? Delivered 1-1-2020 Opened
1 1-2-2021 100 New Offer Delivered NaN NaN
2 1-2-2021 100 New Offer Delivered NaN NaN
3 1-4-2021 144 Last chance, buy now! Delivered 2-4-2021 Opened
4 1-4-2021 144 Last chance, buy now! Delivered 2-4-2021 Opened
我所期待的:
Date Delivered Customer_ID Subject Status last action date Open Status
0 1-1-2020 123 How can I Help? Delivered 1-1-2020 Opened
1 1-2-2021 100 New Offer Delivered 1-2-2021 NaN
2 1-2-2021 100 New Offer Delivered 1-2-2021 NaN
3 1-4-2021 144 Last chance, buy now! Delivered 2-4-2021 Opened
4 1-4-2021 144 Last chance, buy now! Delivered 1-4-2021 NaN
解决方案
让我们使用一个伪“订单”列:
df_del = df.loc[(df['Status'] == 'Delivered')].copy()
df_open = df.loc[(df['Status'] == 'Opened')].copy()
df_del['order'] = df_del.groupby(['Customer_ID']).cumcount()
df_open['order'] = df_open.groupby(['Customer_ID']).cumcount()
d = df_del.rename(columns={'Date': 'Date Delivered'})
o = df_open.rename(columns={'Date': 'last action date', 'Status': 'Open Status'})
w = d.merge(o, on=['Customer_ID','Subject','order'], how='left')
w['last action date'] = w['last action date'].fillna(w['Date Delivered'])
W
输出:
Date Delivered Customer_ID Subject Status order last action date Open Status
0 1-1-2020 123 How can I Help? Delivered 0 1-1-2020 Opened
1 1-2-2021 100 New Offer Delivered 0 1-2-2021 NaN
2 1-2-2021 100 New Offer Delivered 1 1-2-2021 NaN
3 1-4-2021 144 Last chance, buy now! Delivered 0 2-4-2021 Opened
4 1-4-2021 144 Last chance, buy now! Delivered 1 1-4-2021 NaN
推荐阅读
- ng-bootstrap - 带有按钮的 ngbTabTitle 导致重定向
- azure-devops - VSTS 分配以下任务
- c++ - 将指向多维数组向量中单个项目的指针传递给函数c ++
- node.js - npm update 仅更改了依赖项的依赖项
- c++ - 如何与 SDL 库链接?
- amazon-web-services - 如何使用 AWS CLI 列出附加到 AWS IoT 事物证书的 IAM 策略?
- python - 如何在 python pandas 中参考其他数据框创建新列
- docker - Docker ADD 不会复制到容器中
- excel - 将字符串值传递给函数并返回集合对象 VBA
- ios - ARKit - 如何让对象跟随用户