首页 > 解决方案 > 如何在同一单元格上按条件用另一个数据框填充数据框并求和?

问题描述

如下所示,有 2 个数据帧 df1 和 df2。

df1 说明了每列名称的数量数据。所以每一列将包含从地板到下的数量。

    Column name    Floor    Under  ConcType   ConcTotal     RebarTotal     SteelTotal
0   CB1            ACP      GF     C80        21.6          7954.8         2265.0
1   CB1A           B4F      LGF    C60        1.2           318.8          0.0
...   
201 CT12B          GF       8F     C60        4.5           757.6         840.0
202 CT13A          GF       8F     C80        1.4           429.3         420.0

而df2是一个填充量数据分隔层的表格。但df2在Fl.Name_h(列)上的意思是不同的,它将包含所有列的数量表格df1来承载该Floor。

H 表示承载该楼层的柱子高度。它由标高楼层 - 标高上一层楼计算。

例如 F8 楼的 H 为 48400 - 42800 = 5600

    Fl.Name_Elev    Elevation    Fl.Name_h   H         C60_Conc  C80_Conc   Rebar   Steel
0   8F              48400        8F          5600.0    NaN       NaN        NaN     NaN 
1   7F              42800        7F          6000.0    NaN       NaN        NaN     NaN
2   6F              36800        6F          6000.0    NaN       NaN        NaN     NaN
3   5F              30800        5F          5600.0    NaN       NaN        NaN     NaN
4   4F              25200        4F          5600.0    NaN       NaN        NaN     NaN
5   3F              19600        3F          5600.0    NaN       NaN        NaN     NaN
6   2F              14000        2F          6000.0    NaN       NaN        NaN     NaN
7   1F              8000         1F          6000.0    NaN       NaN        NaN     NaN
8   GF              2000         GF          3300.0    NaN       NaN        NaN     NaN
9   LGF MZ          -1300        LGF MZ      3150.0    NaN       NaN        NaN     NaN 
10  LGF             -4450        LGF         4650.0    NaN       NaN        NaN     NaN
11  B1F             -9100        B1F         3000.0    NaN       NaN        NaN     NaN
12  B2F             -12100       B2F         3000.0    NaN       NaN        NaN     NaN
13  B3F             -15100       B3F         4650.0    NaN       NaN        NaN     NaN 
14  B4F             -19750       B4F         8250.0    NaN       NaN        NaN     NaN
15  ACP             -28000       NaN         NaN       NaN       NaN        NaN     NaN

我想从 df1 填写 df2 中“C60_Conc”、“C80_Conc”、“Rebar”、“Steel”的 NaN。

结果将如下所示:

Fl.Name_Elev    Elevation    Fl.Name_h       H         C60_Conc C80_Conc    Rebar    Steel
0   8F              48400        8F          5600.0    4.5      1.4         1186.9   1260   
1   7F              42800        7F          6000.0    4.5      1.4         1186.9   1260
2   6F              36800        6F          6000.0    4.5      1.4         1186.9   1260
3   5F              30800        5F          5600.0    4.5      1.4         1186.9   1260
4   4F              25200        4F          5600.0    4.5      1.4         1186.9   1260
5   3F              19600        3F          5600.0    4.5      1.4         1186.9   1260
6   2F              14000        2F          6000.0    4.5      1.4         1186.9   1260
7   1F              8000         1F          6000.0    4.5      1.4         1186.9   1260
8   GF              2000         GF          3300.0    0        21.6        7954.8   2265
9   LGF MZ          -1300        LGF MZ      3150.0    0        21.6        7954.8   2265   
10  LGF             -4450        LGF         4650.0    1.2      21.6        8276.6   2265
11  B1F             -9100        B1F         3000.0    1.2      21.6        8276.6   2265
12  B2F             -12100       B2F         3000.0    1.2      21.6        8276.6   2265
13  B3F             -15100       B3F         4650.0    1.2      21.6        8276.6   2265   
14  B4F             -19750       B4F         8250.0    0        21.6        7954.8   2265
15  ACP             -28000       NaN         NaN       NaN      NaN         NaN      NaN

PS对不起我的交流。英语不是我的母语。

标签: pythonpandasdataframe

解决方案


解决方案:

floors = df2["Fl.Name_Elev"].tolist()[::-1]
df1["Floors"] = df1.apply(lambda x: floors[floors.index(x["Floor"]):floors.index(x["Under"])+1], axis=1)
df1 = df1.explode(column=["Floors"])
mapper = df1.pivot(index=df1.columns.difference(["ConcType", "ConcTotal"]), 
                   columns="ConcType", 
                   values="ConcTotal").reset_index().groupby("Floors").sum()
output = df2.merge(mapper, left_on="Fl.Name_Elev", right_index=True, how="left")
output = output.rename(columns={"C60": "C60_Conc", "C80": "C80_Conc"})
>>> output
   Fl.Name_Elev  Elevation Fl.Name_h  ...  SteelTotal  C60_Conc  C80_Conc
0            8F      48400        8F  ...        1260       4.5       1.4
1            7F      42800        7F  ...        1260       4.5       1.4
2            6F      36800        6F  ...        1260       4.5       1.4
3            5F      30800        5F  ...        1260       4.5       1.4
4            4F      25200        4F  ...        1260       4.5       1.4
5            3F      19600        3F  ...        1260       4.5       1.4
6            2F      14000        2F  ...        1260       4.5       1.4
7            1F       8000        1F  ...        1260       4.5       1.4
8            GF       2000        GF  ...        3525       4.5      23.0
9        LGF MZ      -1300    LGF MZ  ...        2265       0.0      21.6
10          LGF      -4450       LGF  ...        2265       1.2      21.6
11          B1F      -9100       B1F  ...        2265       1.2      21.6
12          B2F     -12100       B2F  ...        2265       1.2      21.6
13          B3F     -15100       B3F  ...        2265       1.2      21.6
14          B4F     -19750       B4F  ...        2265       1.2      21.6
15          ACP     -28000       NaN  ...        2265       0.0      21.6

解释

首先,我们使用 df1 包括从 Floor 到 Under 的所有楼层pandas.explode

floors = df2["Fl.Name_Elev"].tolist()[::-1]
df1["Floors"] = df1.apply(lambda x: floors[floors.index(x["Floor"]):floors.index(x["Under"])+1], axis=1)
df1 = df1.explode(column=["Floors"])
>>> df1.head()
  Column name Floor Under ConcType  ConcTotal  RebarTotal  SteelTotal  Floors
0         CB1   ACP    GF      C80       21.6      7954.8        2265     ACP
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B4F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B3F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B2F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B1F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     LGF
0         CB1   ACP    GF      C80       21.6      7954.8        2265  LGF MZ
0         CB1   ACP    GF      C80       21.6      7954.8        2265      GF
1        CB1A   B4F   LGF      C60        1.2       318.8           0     B4F
1        CB1A   B4F   LGF      C60        1.2       318.8           0     B3F

然后,我们创建一个映射器,通过使用pandas.pivot来获取 C60 和 C80 的单独列,并使用pandas.groupbywithsum来获取每个楼层的总数:

mapper = df1.pivot(index=df1.columns.difference(["ConcType", "ConcTotal"]),
                   columns="ConcType",
                   values="ConcTotal") \
            .reset_index() \
            .groupby("Floor").sum()
>>> mapper
ConcType  RebarTotal  SteelTotal  C60   C80
Floors                                     
1F            1186.9        1260  4.5   1.4
2F            1186.9        1260  4.5   1.4
3F            1186.9        1260  4.5   1.4
4F            1186.9        1260  4.5   1.4
5F            1186.9        1260  4.5   1.4
6F            1186.9        1260  4.5   1.4
7F            1186.9        1260  4.5   1.4
8F            1186.9        1260  4.5   1.4
ACP           7954.8        2265  0.0  21.6
B1F           8273.6        2265  1.2  21.6
B2F           8273.6        2265  1.2  21.6
B3F           8273.6        2265  1.2  21.6
B4F           8273.6        2265  1.2  21.6
GF            9141.7        3525  4.5  23.0
LGF           8273.6        2265  1.2  21.6
LGF MZ        7954.8        2265  0.0  21.6

最后,我们使用pandas.mergedf2 和 mapper 来获取输出:

output = df2.merge(mapper, left_on="Fl.Name_Elev", right_index=True, how="left")
output = output.rename(columns={"C60": "C60_Conc", "C80": "C80_Conc"})

推荐阅读