python - 合并具有相同id的csv行并对列进行处理
问题描述
编辑1
@Serge 答案适用于我的测试数据,但在我的真实数据中出现错误而失败
开始.csv
id;index;level1;level2;level3;strate;value
;1;Recruitement;;Recruitement start date;Total;2021-07-01
;1;Recruitement;;Recruitement start date;Ivory Coast;2021-07-01
;1;Recruitement;;Recruitement start date;Madagascar;2021-07-01
;1;Recruitement;;Recruitement start date;South Africa;2021-07-01
;2;Recruitement;;Sex (Woman), %;Total;47.0
;2;Recruitement;;Sex (Woman), %;Ivory Coast;40.0
;2;Recruitement;;Sex (Woman), %;Madagascar;40.0
;2;Recruitement;;Sex (Woman), %;South Africa;60.0
5-Total;3;Follow up;;D7 visits: N performed;Total;4 # -> LINE TO MERGE WITH...
5-Ivory Coast;3;Follow up;;D7 visits: N performed;Ivory Coast;1
5-Madagascar;3;Follow up;;D7 visits: N performed;Madagascar;2
5-South Africa;3;Follow up;;D7 visits: N performed;South Africa;1
5-Total;4;Follow up;;D7 visits: N expected;Total;12 #.... THIS LINE
5-Ivory Coast;4;Follow up;;D7 visits: N expected;Ivory Coast;4
5-Madagascar;4;Follow up;;D7 visits: N expected;Madagascar;4
5-South Africa;4;Follow up;;D7 visits: N expected;South Africa;4
;6;Recruitement;;Age, median;Total;35.0
;6;Recruitement;;Age, median;Ivory Coast;35.0
;6;Recruitement;;Age, median;Madagascar;31.0
;6;Recruitement;;Age, median;South Africa;36.0
;7;Recruitement;;Age, IQR;Total;(26.5-48.5)
;7;Recruitement;;Age, IQR;Ivory Coast;(26.0-48.0)
;7;Recruitement;;Age, IQR;Madagascar;(26.0-49.0)
;7;Recruitement;;Age, IQR;South Africa;(28.0-40.0)
;8;Recruitement;;Randomized all, N;Total;15
;8;Recruitement;;Randomized all, N;Ivory Coast;5
;8;Recruitement;;Randomized all, N;Madagascar;5
;8;Recruitement;;Randomized all, N;South Africa;5
15-Total;9;Recruitement;;Included severity stage 1, N;Total;1
15-South Africa;9;Recruitement;;Included severity stage 1, N;South Africa;1
16-Total;10;Recruitement;;Included severity stage 2, N;Total;8
16-Ivory Coast;10;Recruitement;;Included severity stage 2, N;Ivory Coast;4
16-Madagascar;10;Recruitement;;Included severity stage 2, N;Madagascar;1
16-South Africa;10;Recruitement;;Included severity stage 2, N;South Africa;3
17-Total;11;Recruitement;;Included severity stage 3, N;Total;6
17-Ivory Coast;11;Recruitement;;Included severity stage 3, N;Ivory Coast;1
17-Madagascar;11;Recruitement;;Included severity stage 3, N;Madagascar;4
17-South Africa;11;Recruitement;;Included severity stage 3, N;South Africa;1
15-Total;12;Recruitement;;Included severity stage 1, %;Total;7.0
15-Ivory Coast;12;Recruitement;;Included severity stage 1, %;Ivory Coast;nan
15-Madagascar;12;Recruitement;;Included severity stage 1, %;Madagascar;nan
15-South Africa;12;Recruitement;;Included severity stage 1, %;South Africa;20.0
16-Total;13;Recruitement;;Included severity stage 2, %;Total;53.0
16-Ivory Coast;13;Recruitement;;Included severity stage 2, %;Ivory Coast;80.0
16-Madagascar;13;Recruitement;;Included severity stage 2, %;Madagascar;20.0
16-South Africa;13;Recruitement;;Included severity stage 2, %;South Africa;60.0
17-Total;14;Recruitement;;Included severity stage 3, %;Total;40.0
17-Ivory Coast;14;Recruitement;;Included severity stage 3, %;Ivory Coast;20.0
17-Madagascar;14;Recruitement;;Included severity stage 3, %;Madagascar;80.0
17-South Africa;14;Recruitement;;Included severity stage 3, %;South Africa;20.0
我想合并具有相同id的csv行并对行的列进行一些处理
开始.csv
index;level1;level2;level3;strate;value;id
3;Follow up;;D7 visits: N performed;Total;2;1
4;Follow up;;D7 visits: N expected;Total;6;1
3;Follow up;;D7 visits: N performed;Ivory Coast;1;2
4;Follow up;;D7 visits: N expected;Ivory Coast;4;2
3;Follow up;;D7 visits: N performed;Madagascar;1;3
4;Follow up;;D7 visits: N expected;Madagascar;2;3
我想像这样合并具有相同 id 的行:
结束.csv
index;level1;level2;level3;strate;value;id
3;Follow up;;D7 visits: N performed/expected %;Total;0.33;1
4;Follow up;;D7 visits: N performed/expected %;Ivory Coast;0.75;2
4;Follow up;;D7 visits: N performed/expected %;IMadagascar;0.50;3
我可能需要使用 groupby('id') 并为列处理值应用一个函数,但是......
df.replace('D7 visits: N expected','D7 visits: N performed/expected %').groupby("id",as_index=False,sort=False).last()
给出输出
index;level1;level2;level3;strate;value;id
3;Follow up;;D7 visits: N performed/expected %;Total;6;1
4;Follow up;;D7 visits: N performed/expected %;Ivory Coast;4;2
4;Follow up;;D7 visits: N performed/expected %;IMadagascar;2;3
解决方案
旋转数据框可能会有所帮助:
tmp = df.pivot(index='id', columns='level3', values='value')
tmp['value'] = tmp['D7 visits: N performed']/tmp['D7 visits: N expected']
tmp['level3'] = 'D7 visits: N performed/expected %'
resul = df[['index', 'level1', 'level2', 'strate', 'id']].drop_duplicates(
'id').merge(tmp[['level3', 'value']], left_on='id',right_index=True).reindex(
columns=['index', 'level1', 'level2', 'level3', 'strate', 'value', 'id'])
resul.to_csv('end.csv', sep=';')
应该生成预期的 csv 文件。
推荐阅读
- ember.js - 解析路径中的关键路径(或链式关系)
- python - 将从 for 循环中获取的三个值放入 DataFrame / Python
- javascript - 在位于数组对象中的对象数组中搜索
- javascript - 在回调完成之前反应 setstate 不渲染
- angular - ElementRef 不包含在源代码中
- python - 手写神经网络权重不变
- r - 查找 R 中匹配元素的数量
- c - 从 Swift 桥接头 (Linux) 导入 stdlib.h 时出错
- bash - When would BASH_SOURCE[1] match $0 when BASH_SOURCE[0] doesn't?
- solaris - 安装 Solaris Studio 失败