pandas - 熊猫，每 x 行取决于其他行的值

问题描述

我有一个数据集，其中的值在不同的行中都是父项和子项。父母和孩子的 ID 格式略有不同，所以我应该能够用正则表达式识别它们。

所以结构是这样的

Parent ID | Other data
Child ID | Other data
Child ID | Other data
Child ID | Other data
Parent ID | Other data
Child ID | Other data
Parent ID | Other data
Child ID | Other data
Child ID | Other data
Child ID | Other data

没有固定数量的孩子，但唯一始终正确的是，父母会先出现，然后是孩子，然后是下一个父母，然后是孩子，依此类推。

我不确定如何识别这一点。理想情况下，我能够遍历行，并在不同的（新）行中用父母的 ID 标记所有孩子。

它不是一个很好的结构，但它来自数据源。

我想要这样的输出

Parent ID | Other data
Child ID | Other data | Parent ID
Child ID | Other data | Parent ID
Child ID | Other data | Parent ID
Parent ID | Other data | 
Child ID | Other data | Parent ID
Parent ID | Other data |
Child ID | Other data | Parent ID
Child ID | Other data | Parent ID
Child ID | Other data | Parent ID

所以整个文件，数千行，遵循这种格式，首先列出父级，所有子级，然后是下一个父级。

标签： pandas

ffill你当然可以用一些掩蔽来做到这一点

# identify all parents
# replace with your regex
patt = '(Parent)'
is_parent = df['ID'].str.extract(patt).notnull()[0]

# ids:
df['parent_ID'] = df['ID'].where(is_parent).ffill().mask(is_parent)

输出：

          ID        data   ParentID
0  Parent ID  Other data        NaN
1   Child ID  Other data  Parent ID
2   Child ID  Other data  Parent ID
3   Child ID  Other data  Parent ID
4  Parent ID  Other data        NaN
5   Child ID  Other data  Parent ID
6  Parent ID  Other data        NaN
7   Child ID  Other data  Parent ID
8   Child ID  Other data  Parent ID
9   Child ID  Other data  Parent ID

pandas - 熊猫，每 x 行取决于其他行的值

问题描述

解决方案

推荐阅读