pandas - 用 pandas 的唯一引用填充数据组
问题描述
我的发票数据如下
+----------------+-----------+-------------+-----+-------+
| ID | Date | Description | QTY | Price |
+----------------+-----------+-------------+-----+-------+
| 1XpP1 | 08-Feb-19 | A | 1 | 8 |
| Total [INV001] | | | 8 | 8 |
| 1XpQ1 | 08-Feb-19 | A | 1 | 10 |
| 1XpQ1 | 08-Feb-19 | B | 1 | 10 |
| Total [INV002] | | | 2 | 20 |
| 1XpP1 | 08-Feb-19 | A | 1 | 12 |
| 1XpP1 | 08-Feb-19 | B | 1 | 12 |
| 1XpP1 | 08-Feb-19 | C | 1 | 12 |
| 1XpP1 | 08-Feb-19 | D | 1 | 12 |
| Total [INV003] | | | 4 | 48 |
+----------------+-----------+-------------+-----+-------+
注意Total
每张发票下的行。其中包含invoice No
. 我想完全删除这一行并Total
分别在差异列中添加引用。我想要的输出如下。
+-------+-----------+-------------+-----+-------+----------------+
| ID | Date | Description | QTY | Price | ID Adjusted |
+-------+-----------+-------------+-----+-------+----------------+
| 1XpP1 | 08-Feb-19 | A | 1 | 8 | Total [INV001] |
| 1XpQ1 | 08-Feb-19 | A | 1 | 10 | Total [INV002] |
| 1XpQ1 | 08-Feb-19 | B | 1 | 10 | Total [INV002] |
| 1XpP1 | 08-Feb-19 | A | 1 | 12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | B | 1 | 12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | C | 1 | 12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | D | 1 | 12 | Total [INV003] |
+-------+-----------+-------------+-----+-------+----------------+
请给我一个开始解决这个问题。我不知道如何对这些数据进行分组。由于发票中的行数不同。
解决方案
想法是使用Series.where
withSeries.str.endswith
来处理不匹配行的缺失值并回填它们,然后通过反转掩码过滤 with boolean indexing
,最后可以添加DataFrame.copy
以避免SettingwithCopyWarning
稍后处理数据:
m = df['ID'].str.endswith(']')
df['ID Adjusted'] = df['ID'].where(m).bfill()
df = df[~m].copy()
print (df)
ID Date Description QTY Price ID Adjusted
0 1XpP1 08-Feb-19 A 1 8 Total [INV001]
2 1XpQ1 08-Feb-19 A 1 10 Total [INV002]
3 1XpQ1 08-Feb-19 B 1 10 Total [INV002]
5 1XpR1 08-Feb-19 A 1 12 Total [INV003]
6 1XpR1 08-Feb-19 B 1 12 Total [INV003]
7 1XpR1 08-Feb-19 C 1 12 Total [INV003]
8 1XpR1 08-Feb-19 D 1 12 Total [INV003]