python - 当列中两个连续的单元格值(字符串)相同时如何拆分数据框
问题描述
当列中的两个或多个连续单元格值(字符串)相同时,我想拆分数据框。之前:初始数据帧 之后/预期:拆分之后
任何帮助表示赞赏。提前致谢。
数据:{'AuthorName': {0: 'Aeschylus', 1: 'Aeschylus', 2: 'Aeschylus', 3: 'Aeschylus', 4: 'Aeschylus', 5: 'Aeschylus', 6: 'Aeschylus', 7: 'Aeschylus', 8: 'Aeschylus', 9: 'Aeschylus', 10: 'Aeschylus', 11: 'Aeschylus', 12: 'Aeschylus', 13: 'Aeschylus', 14: 'Aeschylus', 15: 'Aeschylus', 16: 'Aeschylus', 17: 'Aeschylus', 18: 'Aeschylus', 19: 'Aeschylus', 20: 'Aeschylus', 21: 'Aeschylus', 22: 'Aeschylus', 23: 'Aeschylus', 24: 'Aeschylus', 25: 'Aeschylus', 26: 'Aeschylus', 27: 'Aeschylus', 28: 'Aeschylus', 29: 'Aeschylus', 30: 'Aeschylus', 31: 'Aeschylus', 32: 'Aeschylus', 33: 'Aeschylus', 34: 'Aeschylus', 35: 'Aeschylus', 36: 'Aeschylus', 37: 'Aeschylus', 38: 'Aeschylus', 39: 'Aeschylus'}, 'PlayName': {0: 'Agamemnon', 1: 'Agamemnon', 2: 'Agamemnon', 3: 'Agamemnon', 4: 'Agamemnon', 5: 'Agamemnon', 6: 'Agamemnon', 7: 'Agamemnon', 8: 'Agamemnon', 9: 'Agamemnon', 10: 'Agamemnon', 11: 'Agamemnon', 12: 'Agamemnon', 13: 'Agamemnon', 14: 'Agamemnon', 15: 'Agamemnon', 16: 'Agamemnon', 17: 'Agamemnon', 18: 'Agamemnon', 19: 'Agamemnon', 20: 'Agamemnon', 21: 'Agamemnon', 22: 'Agamemnon', 23: 'Agamemnon', 24: 'Agamemnon', 25: 'Agamemnon', 26: 'Agamemnon', 27: 'Agamemnon', 28: 'Agamemnon', 29: 'Agamemnon', 30: 'Agamemnon', 31: 'Agamemnon', 32: 'Agamemnon', 33: 'Agamemnon', 34: 'Agamemnon', 35: 'Agamemnon', 36: 'Agamemnon', 37: 'Agamemnon', 38: 'Agamemnon', 39: 'Agamemnon'}, 'ParagraphNumber': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15, 15: 16, 16: 17, 17: 18, 18: 19, 19: 20, 20: 21, 21: 22, 22: 23, 23: 24, 24: 25, 25: 26, 26: 27, 27: 28, 28: 29, 29: 30, 30: 31, 31: 32, 32: 33, 33: 34, 34: 35, 35: 36, 36: 37, 37: 38, 38: 39, 39: 40}, 'CharacterName': {0: 'Watchman', 1: 'Chorus', 2: 'Chorus', 3: 'Chorus', 4: 'Chorus', 5: 'Clytaemestra', 6: 'Chorus', 7: 'Clytaemestra', 8: 'Chorus', 9: 'Chorus', 10: 'Chorus', 11: 'Chorus', 12: 'OneElder', 13: 'Chorus_Leader', 14: 'AnotherElder', 15: 'Herald', 16: 'Chorus', 17: 'Herald', 18: 'Chorus', 19: 'Herald', 20: 'Chorus', 21: 'Herald', 22: 'Chorus', 23: 'Herald', 24: 'Chorus', 25: 'Chorus', 26: 'Chorus', 27: 'Chorus', 28: 'Chorus', 29: 'Chorus', 30: 'Chorus', 31: 'Chorus', 32: 'Chorus', 33: 'Agamemnon', 34: 'Clytaemestra', 35: 'Agamemnon', 36: 'Clytaemestra', 37: 'Agamemnon', 38: 'Clytaemestra', 39: 'Agamemnon'}}
解决方案
尝试:
x = df.groupby((df.CharacterName != df.CharacterName.shift()).cumsum())[
"CharacterName"
].transform("count")
df["segment"] = "Segment" + (
df.groupby((x != x.shift()).cumsum()).ngroup() + 1
).astype(str)
print(df)
印刷:
AuthorName PlayName ParagraphNumber CharacterName segment
0 Aeschylus Agamemnon 1 Watchman Segment1
1 Aeschylus Agamemnon 2 Chorus Segment2
2 Aeschylus Agamemnon 3 Chorus Segment2
3 Aeschylus Agamemnon 4 Chorus Segment2
4 Aeschylus Agamemnon 5 Chorus Segment2
5 Aeschylus Agamemnon 6 Clytaemestra Segment3
6 Aeschylus Agamemnon 7 Chorus Segment3
7 Aeschylus Agamemnon 8 Clytaemestra Segment3
8 Aeschylus Agamemnon 9 Chorus Segment4
9 Aeschylus Agamemnon 10 Chorus Segment4
10 Aeschylus Agamemnon 11 Chorus Segment4
11 Aeschylus Agamemnon 12 Chorus Segment4
12 Aeschylus Agamemnon 13 OneElder Segment5
13 Aeschylus Agamemnon 14 Chorus_Leader Segment5
14 Aeschylus Agamemnon 15 AnotherElder Segment5
15 Aeschylus Agamemnon 16 Herald Segment5
16 Aeschylus Agamemnon 17 Chorus Segment5
17 Aeschylus Agamemnon 18 Herald Segment5
18 Aeschylus Agamemnon 19 Chorus Segment5
19 Aeschylus Agamemnon 20 Herald Segment5
20 Aeschylus Agamemnon 21 Chorus Segment5
21 Aeschylus Agamemnon 22 Herald Segment5
22 Aeschylus Agamemnon 23 Chorus Segment5
23 Aeschylus Agamemnon 24 Herald Segment5
24 Aeschylus Agamemnon 25 Chorus Segment6
25 Aeschylus Agamemnon 26 Chorus Segment6
26 Aeschylus Agamemnon 27 Chorus Segment6
27 Aeschylus Agamemnon 28 Chorus Segment6
28 Aeschylus Agamemnon 29 Chorus Segment6
29 Aeschylus Agamemnon 30 Chorus Segment6
30 Aeschylus Agamemnon 31 Chorus Segment6
31 Aeschylus Agamemnon 32 Chorus Segment6
32 Aeschylus Agamemnon 33 Chorus Segment6
33 Aeschylus Agamemnon 34 Agamemnon Segment7
34 Aeschylus Agamemnon 35 Clytaemestra Segment7
35 Aeschylus Agamemnon 36 Agamemnon Segment7
36 Aeschylus Agamemnon 37 Clytaemestra Segment7
37 Aeschylus Agamemnon 38 Agamemnon Segment7
38 Aeschylus Agamemnon 39 Clytaemestra Segment7
39 Aeschylus Agamemnon 40 Agamemnon Segment7
编辑:仅对名称“合唱”进行分段
x = df.groupby((df.CharacterName != df.CharacterName.shift()).cumsum())[
"CharacterName"
].transform("count")
x *= df.CharacterName.eq("Chorus") & (x > 1)
df["segment"] = "Segment" + (
df.groupby((x != x.shift()).cumsum()).ngroup() + 1
).astype(str)
print(df)
印刷:
AuthorName PlayName ParagraphNumber CharacterName segment
0 Aeschylus Agamemnon 1 Watchman Segment1
1 Aeschylus Agamemnon 2 Chorus Segment2
2 Aeschylus Agamemnon 3 Chorus Segment2
3 Aeschylus Agamemnon 4 Chorus Segment2
4 Aeschylus Agamemnon 5 Chorus Segment2
5 Aeschylus Agamemnon 6 Clytaemestra Segment3
6 Aeschylus Agamemnon 7 Chorus Segment3
7 Aeschylus Agamemnon 8 Clytaemestra Segment3
8 Aeschylus Agamemnon 9 Chorus Segment4
9 Aeschylus Agamemnon 10 Chorus Segment4
10 Aeschylus Agamemnon 11 Chorus Segment4
11 Aeschylus Agamemnon 12 Chorus Segment4
12 Aeschylus Agamemnon 13 OneElder Segment5
13 Aeschylus Agamemnon 14 Chorus_Leader Segment5
14 Aeschylus Agamemnon 15 AnotherElder Segment5
15 Aeschylus Agamemnon 16 Herald Segment5
16 Aeschylus Agamemnon 17 Chorus Segment5
17 Aeschylus Agamemnon 18 Herald Segment5
18 Aeschylus Agamemnon 19 Chorus Segment5
19 Aeschylus Agamemnon 20 Herald Segment5
20 Aeschylus Agamemnon 21 Chorus Segment5
21 Aeschylus Agamemnon 22 Herald Segment5
22 Aeschylus Agamemnon 23 Chorus Segment5
23 Aeschylus Agamemnon 24 Herald Segment5
24 Aeschylus Agamemnon 25 Chorus Segment6
25 Aeschylus Agamemnon 26 Chorus Segment6
26 Aeschylus Agamemnon 27 Chorus Segment6
27 Aeschylus Agamemnon 28 Chorus Segment6
28 Aeschylus Agamemnon 29 Chorus Segment6
29 Aeschylus Agamemnon 30 Chorus Segment6
30 Aeschylus Agamemnon 31 Chorus Segment6
31 Aeschylus Agamemnon 32 Chorus Segment6
32 Aeschylus Agamemnon 33 Chorus Segment6
33 Aeschylus Agamemnon 34 Agamemnon Segment7
34 Aeschylus Agamemnon 35 Agamemnon Segment7
35 Aeschylus Agamemnon 36 Agamemnon Segment7
36 Aeschylus Agamemnon 37 Clytaemestra Segment7
37 Aeschylus Agamemnon 38 Agamemnon Segment7
38 Aeschylus Agamemnon 39 Clytaemestra Segment7
39 Aeschylus Agamemnon 40 Agamemnon Segment7