python - 按组将行插入数据帧,并且条目来自另一个 dataframe_complex 匹配
问题描述
我希望为每个组在名为“df_recorded”的数据框中插入一些条目,并从另一个名为“df_missed”的数据框中搜索该条目。
import pandas as pd
df_recorded = pd.DataFrame({
'id': ['2008 11', '2008 11', '2008 11', '2008 07', '2008 07', '2008 12', '2008 12', '2008 12'],
'info': ['recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', ],
'score': [98, 68, 79, 75, 66, 62, 60, 60],
'date' : ['2010-12-10', '2010-10-01', '2010-09-12', '2010-12-10', '2010-11-01', '2010-12-07', '2010-11-10', '2010-09-12']
})
df_missed = pd.DataFrame({
'id': ['2008 11', '2008 07', '2008 12'],
'missed_score': [62, 72, 80],
'missed_date': ['2010-08-01', '2010-10-20', '2010-07-23']
})
id info score date
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
3 2008 07 recorded 75 2010-12-10
4 2008 07 recorded 66 2010-11-01
5 2008 12 recorded 62 2010-12-07
6 2008 12 recorded 60 2010-11-10
7 2008 12 recorded 60 2010-09-12
df_missed
id missed_score missed_date
0 2008 11 62 2010-08-01
1 2008 07 72 2010-10-20
2 2008 12 80 2010-07-23
我想在'df_recorded'中为每个组添加一行,例如在'info'列中添加相同的'id = 2008 11'和一个名为'missed'的新条目,然后添加分数和日期通过搜索 df_missed 表,结果应如下所示:
Target result:
id info score date
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
3 2008 11 missed 62 2010-08-01 # new record
4 2008 07 recorded 75 2010-12-10
5 2008 07 recorded 66 2010-11-01
6 2008 07 missed 72 2010-10-20 # new record
7 2008 12 recorded 62 2010-12-07
8 2008 12 recorded 60 2010-11-10
9 2008 12 recorded 60 2010-09-12
10 2008 12 missed 80 2010-07-23 # new record
我尝试使用循环进行编码,但速度非常慢且效率低下。因此,如果您有任何改进的想法,请提供帮助。非常感谢。
解决方案
IIUC,您可以简单地重命名缺少的 df 和 中的列concat
:
df_missed.columns = ["id", "score", "date"]
df = pd.concat([df_recorded,df_missed], ignore_index=True, sort=False).sort_values("id", ascending=False)
df.loc[df["info"].isnull(),"info"] = "missing"
print (df)
id info score date
5 2008 12 recorded 62 2010-12-07
6 2008 12 recorded 60 2010-11-10
7 2008 12 recorded 60 2010-09-12
10 2008 12 missing 80 2010-07-23
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
8 2008 11 missing 62 2010-08-01
3 2008 07 recorded 75 2010-12-10
4 2008 07 recorded 66 2010-11-01
9 2008 07 missing 72 2010-10-20
推荐阅读
- javafx - 显示 ArrayList 的值
- > 在 TableView (JavaFX) 中
- swift - 无法查看并使其工作 - 斯威夫特
- c++ - C++ 模板函数在 Visual Studio 2019 中不起作用 [错误 2668]
- opencv - 有没有办法使用 OCR 从 CAD 技术图纸中提取特定数据?
- node.js - 如何在 Mongo 查询中传递变量
- postgresql - Fabric-ca-server 重启后无法初始化 postgres 数据库
- python - 如何在 Python 中正确安装 MySQL 模块
- linux - 如何将结果写入空变量?
- java - Maven 构建失败:包不存在
- javascript - 对大数使用 math.floor() 运算时出现意外结果