首页 > 解决方案 > 用 pandas 的唯一引用填充数据组

问题描述

我的发票数据如下

+----------------+-----------+-------------+-----+-------+
|       ID       |   Date    | Description | QTY | Price |
+----------------+-----------+-------------+-----+-------+
| 1XpP1          | 08-Feb-19 | A           |   1 |     8 |
| Total [INV001] |           |             |   8 |     8 |
| 1XpQ1          | 08-Feb-19 | A           |   1 |    10 |
| 1XpQ1          | 08-Feb-19 | B           |   1 |    10 |
| Total [INV002] |           |             |   2 |    20 |
| 1XpP1          | 08-Feb-19 | A           |   1 |    12 |
| 1XpP1          | 08-Feb-19 | B           |   1 |    12 |
| 1XpP1          | 08-Feb-19 | C           |   1 |    12 |
| 1XpP1          | 08-Feb-19 | D           |   1 |    12 |
| Total [INV003] |           |             |   4 |    48 |
+----------------+-----------+-------------+-----+-------+

注意Total每张发票下的行。其中包含invoice No. 我想完全删除这一行并Total分别在差异列中添加引用。我想要的输出如下。

+-------+-----------+-------------+-----+-------+----------------+
|  ID   |   Date    | Description | QTY | Price |  ID Adjusted   |
+-------+-----------+-------------+-----+-------+----------------+
| 1XpP1 | 08-Feb-19 | A           |   1 |     8 | Total [INV001] |
| 1XpQ1 | 08-Feb-19 | A           |   1 |    10 | Total [INV002] |
| 1XpQ1 | 08-Feb-19 | B           |   1 |    10 | Total [INV002] |
| 1XpP1 | 08-Feb-19 | A           |   1 |    12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | B           |   1 |    12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | C           |   1 |    12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | D           |   1 |    12 | Total [INV003] |
+-------+-----------+-------------+-----+-------+----------------+

请给我一个开始解决这个问题。我不知道如何对这些数据进行分组。由于发票中的行数不同。

标签: pandaspandas-groupby

解决方案


想法是使用Series.wherewithSeries.str.endswith来处理不匹配行的缺失值并回填它们,然后通过反转掩码过滤 with boolean indexing,最后可以添加DataFrame.copy以避免SettingwithCopyWarning稍后处理数据:

m = df['ID'].str.endswith(']')
df['ID Adjusted'] = df['ID'].where(m).bfill()
df = df[~m].copy()
print (df)
      ID       Date Description  QTY  Price     ID Adjusted
0  1XpP1  08-Feb-19           A    1      8  Total [INV001]
2  1XpQ1  08-Feb-19           A    1     10  Total [INV002]
3  1XpQ1  08-Feb-19           B    1     10  Total [INV002]
5  1XpR1  08-Feb-19           A    1     12  Total [INV003]
6  1XpR1  08-Feb-19           B    1     12  Total [INV003]
7  1XpR1  08-Feb-19           C    1     12  Total [INV003]
8  1XpR1  08-Feb-19           D    1     12  Total [INV003]

推荐阅读