pandas - 根据 Pandas 中的索引列合并元素
问题描述
我有以下数据框:
index | element | relation_index
1 dog 0
2 cat 0
3 crow 1
4 snake 3
5 pig 1
6 porcupine 0
7 weasel 2
8 bear 3
我想获得:
index | element | relation_index
1 dog, crow, pig, snake, bear 0
2 cat, weasel 0
3 dog, crow, pig, snake, bear 1
4 dog, crow, pig, snake, bear 3
5 dog, crow, pig, snake, bear 1
6 porcupine 0
7 cat, weasel 2
8 dog, crow, pig, snake, bear 3
所以规则是:
- 将所有元素与一个共同的
index
或relation_index
- 忽略
relation_index
为 0的行
对于大型数据帧,如何有效地做到这一点?
编辑:我忘了提一件事,element
数据类型应该只是一个字符串。
"dog, crow, pig, snake, bear"
解决方案
我会用iterrows
withfor loop
来解决这个问题。
# Rename index to id, prevent pandas error
df.rename(columns={'index': 'id'}, inplace=True)
# Create a parent group
parent = df[df.relation_index == 0].copy()
search_df = df[df.relation_index != 0].copy()
group_index = [[i] for i in parent.id.tolist()]
group_name = [[i] for i in parent.element.tolist()]
print(group_index)
print(group_name)
[[1], [2], [6]]
[['dog'], ['cat'], ['porcupine']]
# Assign group to each id
for _, row in search_df.iterrows():
new_group = True
for i in range(len(group_index)):
if row.relation_index in group_index[i]:
group_index[i].append(row.id)
group_name[i].append(row.element)
new_group = False
break
if new_group:
group_index.append([row.id])
group_name.append([row.element])
print(group_index)
print(group_name)
[[1, 3, 4, 5, 8], [2, 7], [6]]
[['dog', 'crow', 'snake', 'pig', 'bear'], ['cat', 'weasel'], ['porcupine']]
# Assign result back to main df
result = []
for _, row in df.iterrows():
has_group = False
for i in range(len(group_index)):
if row.id in group_index[i]:
result.append(", ".join(group_name[i]))
has_group = True
if not has_group:
result.append(None)
df['result'] = result
df
id element relation_index result
0 1 dog 0 dog, crow, snake, pig, bear
1 2 cat 0 cat, weasel
2 3 crow 1 dog, crow, snake, pig, bear
3 4 snake 3 dog, crow, snake, pig, bear
4 5 pig 1 dog, crow, snake, pig, bear
5 6 porcupine 0 porcupine
6 7 weasel 2 cat, weasel
7 8 bear 3 dog, crow, snake, pig, bear
推荐阅读
- r - 如何在变异的函数(x)中获取数据帧值 x 的列名?
- python - 无法为 OpenCV 发布 VideoCapture
- javascript - 如何使用“&:active”更改材质ui中按钮的颜色:?
- django - 我为用户在 django 中为送货员实施了星级评分系统,但我被困在如何计算平均值上
- abaqus - 如何将草图信息放入 abaqus 的 .inp 文件中?
- amazon-web-services - How to get custom log4j.properties to take effect for Spark driver and executor on AWS EMR cluster?
- node.js - 在 mongodb nodejs 中向博客添加评论
- reactjs - redux-resist 防止在刷新时丢失数据
- java - 在 Raspberry Pi 上通过 snap 安装的 Gradle 无法构建,缺少 cacerts
- java - 如何理解测试课中的“公众”?