python - Pandas:按复杂条件合并组内的两行
问题描述
我有一个如下的df;将熊猫导入为 pd
df = pd.DataFrame({
"ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],
'Sender': [28, 'delete', 'flag_source', 56, 28, 312, 'delete', 'flag_source', 78, 102, 26, 101, 96],
'Receiver': [129, 28, 'delete', 172, 56, 28, 61, 'delete', 12, 78, 98, 26, 101],
'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],
'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],
'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],
'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]
})
它是这样的:
ID Sender Receiver Date Sender_type Receiver_type Price
0 company A 28 129 2020-04-12 house house 32
1 company A delete 28 2020-03-20 temp house 50 # combine this row with below
2 company A flag_source delete 2020-03-20 house temp 47 # combine this row with above
3 company B 56 172 2019-02-11 house house 21
4 company B 28 56 2019-01-31 house house 23
5 company B 312 28 2018-04-02 house house 19
6 company C delete 61 2020-06-29 temp house 52 # combine this row and below
7 company C flag_source delete 2020-06-29 house temp 39 # combine this row with above
8 company C 78 12 2019-11-29 house house 12
9 company C 102 78 2019-10-01 house house 22
10 company D 26 98 2020-04-03 house house 61
11 company D 101 26 2020-01-30 temp house 53
12 company D 96 101 2019-10-18 house temp 19
我希望通过以下规则为每个组“ID”(公司 x)组合/合并两行:将“发件人”中包含“flag_source”的行及其上面的行合并为一个新行。在这个新行中:Sender 是 flag_source,'Revceive' 是它上面的值(删除两个 'delete' 值),Date 是上面的日期,Sender_type 和 Receiver_type 是 'house','Price' 是上面的前一个价值。然后删除这两行。例如,对于 A 公司,它将合并第 1 行和第 2 行以生成以下新行:
ID Sender Receiver Date Sender_type Receiver_type Price
company A flag_source 28 2020-03-20 house house 50
然后使用这个新行替换前两行。其他组的规则相同(在这种情况下仅适用于公司 A 和 C)。最后,我希望得到这样的结果:
ID Sender Receiver Date Sender_type Receiver_type Price
0 company A 28 129 2020-04-12 house house 32
1 company A flag_source 28 2020-03-20 house house 50 # new row
2 company B 56 172 2019-02-11 house house 21
3 company B 28 56 2019-01-31 house house 23
4 company B 312 28 2018-04-02 house house 19
5 company C flag_source 61 2020-06-29 house house 52 # new row
6 company C 78 12 2019-11-29 house house 12
7 company C 102 78 2019-10-01 house house 22
8 company D 26 98 2020-04-03 house house 61
9 company D 101 26 2020-01-30 temp house 53
10 company D 96 101 2019-10-18 house temp 19
希望我对这个问题的解释很清楚。
由于这是一个简短的示例,真实案例有很多这样的数据,我写了一个循环但非常慢且没有效率,所以如果您有任何想法和有效的方法,请帮助。非常感谢您的帮助!
解决方案
import pandas as pd
df = pd.DataFrame({
"ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],
'Sender': [28, 'delete', 'flag_source', 56, 28, 312, 'delete', 'flag_source', 78, 102, 26, 101, 96],
'Receiver': [129, 28, 'delete', 172, 56, 28, 61, 'delete', 12, 78, 98, 26, 101],
'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],
'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],
'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],
'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]
})
flaggedData = (df[df["Sender"] == "flag_source"])
for i,row in flaggedData.iterrows(): # Row variable contains row having sender as flag_source
deleteRow = df[df.index == i-1].values[0] # delete variable contains row having sender as delete
combined = [row[0], # ID
row[1], # Sender
deleteRow[2], # Receiver
deleteRow[3], # Date
row[4], # Sender_type
deleteRow[5], # Receiver_type
deleteRow[6]] # Price
df.loc[i-1] = combined # replace with new values
df = df.drop(index=i) # drop old values
df = df.reset_index() # resent index for better access on future.
print(df.loc[1])
我假设每个“删除”行都在“flag_source”行的上方。如果您仍然不明白,请阅读评论,评论您的疑问。
推荐阅读
- video - 我可以使用ffmpeg在视频中添加带有中心文本的纯色背景作为结束屏幕吗?
- c# - 迁移时与 C# EF 冲突
- cordova - 如何从 Cordova 应用程序链接到 Waze 深度链接?
- c - 抛出异常:Visual Studio 中 Matlab Coder 的访问冲突写入位置
- reactjs - 如何重新获取具有 args 选项的查询
- c - 内存复制功能到用户空间后如何更正相对寻址?
- c# - 我想阻止 Windows 中的所有应用程序,直到用户在组合框中选择选项。在WPF中可以吗?
- .net - .Net HttpWebRequest ClientCertificates 未发送到 API
- npm - npm 将文件复制到共享驱动器
- sql - 如何从视图的连接和后续查询中索引视图?