python - 如何在同一单元格上按条件用另一个数据框填充数据框并求和?
问题描述
如下所示,有 2 个数据帧 df1 和 df2。
df1 说明了每列名称的数量数据。所以每一列将包含从地板到下的数量。
Column name Floor Under ConcType ConcTotal RebarTotal SteelTotal
0 CB1 ACP GF C80 21.6 7954.8 2265.0
1 CB1A B4F LGF C60 1.2 318.8 0.0
...
201 CT12B GF 8F C60 4.5 757.6 840.0
202 CT13A GF 8F C80 1.4 429.3 420.0
而df2是一个填充量数据分隔层的表格。但df2在Fl.Name_h(列)上的意思是不同的,它将包含所有列的数量表格df1来承载该Floor。
H 表示承载该楼层的柱子高度。它由标高楼层 - 标高上一层楼计算。
例如 F8 楼的 H 为 48400 - 42800 = 5600
Fl.Name_Elev Elevation Fl.Name_h H C60_Conc C80_Conc Rebar Steel
0 8F 48400 8F 5600.0 NaN NaN NaN NaN
1 7F 42800 7F 6000.0 NaN NaN NaN NaN
2 6F 36800 6F 6000.0 NaN NaN NaN NaN
3 5F 30800 5F 5600.0 NaN NaN NaN NaN
4 4F 25200 4F 5600.0 NaN NaN NaN NaN
5 3F 19600 3F 5600.0 NaN NaN NaN NaN
6 2F 14000 2F 6000.0 NaN NaN NaN NaN
7 1F 8000 1F 6000.0 NaN NaN NaN NaN
8 GF 2000 GF 3300.0 NaN NaN NaN NaN
9 LGF MZ -1300 LGF MZ 3150.0 NaN NaN NaN NaN
10 LGF -4450 LGF 4650.0 NaN NaN NaN NaN
11 B1F -9100 B1F 3000.0 NaN NaN NaN NaN
12 B2F -12100 B2F 3000.0 NaN NaN NaN NaN
13 B3F -15100 B3F 4650.0 NaN NaN NaN NaN
14 B4F -19750 B4F 8250.0 NaN NaN NaN NaN
15 ACP -28000 NaN NaN NaN NaN NaN NaN
我想从 df1 填写 df2 中“C60_Conc”、“C80_Conc”、“Rebar”、“Steel”的 NaN。
结果将如下所示:
Fl.Name_Elev Elevation Fl.Name_h H C60_Conc C80_Conc Rebar Steel
0 8F 48400 8F 5600.0 4.5 1.4 1186.9 1260
1 7F 42800 7F 6000.0 4.5 1.4 1186.9 1260
2 6F 36800 6F 6000.0 4.5 1.4 1186.9 1260
3 5F 30800 5F 5600.0 4.5 1.4 1186.9 1260
4 4F 25200 4F 5600.0 4.5 1.4 1186.9 1260
5 3F 19600 3F 5600.0 4.5 1.4 1186.9 1260
6 2F 14000 2F 6000.0 4.5 1.4 1186.9 1260
7 1F 8000 1F 6000.0 4.5 1.4 1186.9 1260
8 GF 2000 GF 3300.0 0 21.6 7954.8 2265
9 LGF MZ -1300 LGF MZ 3150.0 0 21.6 7954.8 2265
10 LGF -4450 LGF 4650.0 1.2 21.6 8276.6 2265
11 B1F -9100 B1F 3000.0 1.2 21.6 8276.6 2265
12 B2F -12100 B2F 3000.0 1.2 21.6 8276.6 2265
13 B3F -15100 B3F 4650.0 1.2 21.6 8276.6 2265
14 B4F -19750 B4F 8250.0 0 21.6 7954.8 2265
15 ACP -28000 NaN NaN NaN NaN NaN NaN
PS对不起我的交流。英语不是我的母语。
解决方案
解决方案:
floors = df2["Fl.Name_Elev"].tolist()[::-1]
df1["Floors"] = df1.apply(lambda x: floors[floors.index(x["Floor"]):floors.index(x["Under"])+1], axis=1)
df1 = df1.explode(column=["Floors"])
mapper = df1.pivot(index=df1.columns.difference(["ConcType", "ConcTotal"]),
columns="ConcType",
values="ConcTotal").reset_index().groupby("Floors").sum()
output = df2.merge(mapper, left_on="Fl.Name_Elev", right_index=True, how="left")
output = output.rename(columns={"C60": "C60_Conc", "C80": "C80_Conc"})
>>> output
Fl.Name_Elev Elevation Fl.Name_h ... SteelTotal C60_Conc C80_Conc
0 8F 48400 8F ... 1260 4.5 1.4
1 7F 42800 7F ... 1260 4.5 1.4
2 6F 36800 6F ... 1260 4.5 1.4
3 5F 30800 5F ... 1260 4.5 1.4
4 4F 25200 4F ... 1260 4.5 1.4
5 3F 19600 3F ... 1260 4.5 1.4
6 2F 14000 2F ... 1260 4.5 1.4
7 1F 8000 1F ... 1260 4.5 1.4
8 GF 2000 GF ... 3525 4.5 23.0
9 LGF MZ -1300 LGF MZ ... 2265 0.0 21.6
10 LGF -4450 LGF ... 2265 1.2 21.6
11 B1F -9100 B1F ... 2265 1.2 21.6
12 B2F -12100 B2F ... 2265 1.2 21.6
13 B3F -15100 B3F ... 2265 1.2 21.6
14 B4F -19750 B4F ... 2265 1.2 21.6
15 ACP -28000 NaN ... 2265 0.0 21.6
解释
首先,我们使用 df1 包括从 Floor 到 Under 的所有楼层pandas.explode
:
floors = df2["Fl.Name_Elev"].tolist()[::-1]
df1["Floors"] = df1.apply(lambda x: floors[floors.index(x["Floor"]):floors.index(x["Under"])+1], axis=1)
df1 = df1.explode(column=["Floors"])
>>> df1.head()
Column name Floor Under ConcType ConcTotal RebarTotal SteelTotal Floors
0 CB1 ACP GF C80 21.6 7954.8 2265 ACP
0 CB1 ACP GF C80 21.6 7954.8 2265 B4F
0 CB1 ACP GF C80 21.6 7954.8 2265 B3F
0 CB1 ACP GF C80 21.6 7954.8 2265 B2F
0 CB1 ACP GF C80 21.6 7954.8 2265 B1F
0 CB1 ACP GF C80 21.6 7954.8 2265 LGF
0 CB1 ACP GF C80 21.6 7954.8 2265 LGF MZ
0 CB1 ACP GF C80 21.6 7954.8 2265 GF
1 CB1A B4F LGF C60 1.2 318.8 0 B4F
1 CB1A B4F LGF C60 1.2 318.8 0 B3F
然后,我们创建一个映射器,通过使用pandas.pivot
来获取 C60 和 C80 的单独列,并使用pandas.groupby
withsum
来获取每个楼层的总数:
mapper = df1.pivot(index=df1.columns.difference(["ConcType", "ConcTotal"]),
columns="ConcType",
values="ConcTotal") \
.reset_index() \
.groupby("Floor").sum()
>>> mapper
ConcType RebarTotal SteelTotal C60 C80
Floors
1F 1186.9 1260 4.5 1.4
2F 1186.9 1260 4.5 1.4
3F 1186.9 1260 4.5 1.4
4F 1186.9 1260 4.5 1.4
5F 1186.9 1260 4.5 1.4
6F 1186.9 1260 4.5 1.4
7F 1186.9 1260 4.5 1.4
8F 1186.9 1260 4.5 1.4
ACP 7954.8 2265 0.0 21.6
B1F 8273.6 2265 1.2 21.6
B2F 8273.6 2265 1.2 21.6
B3F 8273.6 2265 1.2 21.6
B4F 8273.6 2265 1.2 21.6
GF 9141.7 3525 4.5 23.0
LGF 8273.6 2265 1.2 21.6
LGF MZ 7954.8 2265 0.0 21.6
最后,我们使用pandas.merge
df2 和 mapper 来获取输出:
output = df2.merge(mapper, left_on="Fl.Name_Elev", right_index=True, how="left")
output = output.rename(columns={"C60": "C60_Conc", "C80": "C80_Conc"})
推荐阅读
- python - Python C API:尝试构建字典的问题
- actions-on-google - 意图未在设备中触发但在模拟器中工作正常?
- python - 是否可以在 anaconda3 中将预先存在的 anaconda2 设置为 py2 环境?
- android - 用 Polygon 在整个地图上绘制一个矩形
- c# - 如何使用 get_range 方法 C# 使用变量?
- angularjs - 将 SPFx Web 部件添加到 SharePoint 经典页面会导致 PnP Widget Wrangler AngularJS 组件崩溃
- regex - 正则表达式:不仅是电子邮件中的数字
- r - Mac上Rstudio下的RWeka安装
- gensim - Spacy/Gensim - 创建相似的句子
- elixir - 关于调用模块内的宏