首页 > 解决方案 > 过滤列值在另一列列表中的行?

问题描述

我有一个数据框,其中有一列具有单个值和一列值列表:

        period  node    key_players
0       0       ZF1013  [ZF1128, ZF176, ZF434, ZF469, ZF659]
1       0       ZF1014  [ZF1128, ZF176, ZF434, ZF469, ZF659]
2       0       ZF1015  [ZF1128, ZF176, ZF434, ZF469, ZF659]
3       0       ZF1020  [ZF1128, ZF176, ZF434, ZF469, ZF659]
4       0       ZF1025  [ZF1128, ZF176, ZF434, ZF469, ZF659]
... ... ... ...
1565    4       ZF898   [ZF1336, ZF1346, ZF3, ZF434, ZF481]
1566    4       ZF945   [ZF1336, ZF1346, ZF3, ZF434, ZF481]
1567    4       ZF948   [ZF1336, ZF1346, ZF3, ZF434, ZF481]
1568    4       ZF97    [ZF1336, ZF1346, ZF3, ZF434, ZF481]
1569    4       ZFM264  [ZF1336, ZF1346, ZF3, ZF434, ZF481]

我想过滤“节点”在“key_players”中的位置。

标签: pythonpandasdataframe

解决方案


我使用你的 df 可见部分的一个版本(对于未来请遵循这个:如何提供一个伟大的熊猫示例

我修改了几行以包含一些节点包含在key_players

from io import StringIO
df = pd.read_csv(StringIO(
"""
        period  node    key_players
0       0       ZF1013  ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
1       0       ZF1014  ['ZF1014', 'ZF176', 'ZF434','ZF469','ZF659']
2       0       ZF1015  ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
3       0       ZF1020  ['ZF1128', 'ZF176', 'ZF434','ZF469','ZF659']
4       0       ZF1025  ['ZF1128', 'ZF1025', 'ZF434','ZF469','ZF659']
1565    4       ZF898   ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1566    4       ZF945   ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1567    4       ZF948   ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1568    4       ZF97    ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
1569    4       ZFM264  ['ZF1336', 'ZF1346','ZF3', 'ZF434,' 'ZF481']
"""), sep = '\s\s+')
df['key_players'] = df['key_players'].apply(eval)

解决方案 1

key_players我们在via中展开列表explode并保留我们匹配的那些行node

df2 = df.assign(kp = df['key_players']).explode('kp')
df2[df2['kp'] == df2['node']].drop(columns = 'kp')

这打印

      period  node    key_players
--  --------  ------  -----------------------------------------------
 1         0  ZF1014  ['ZF1014', 'ZF176', 'ZF434', 'ZF469', 'ZF659']
 4         0  ZF1025  ['ZF1128', 'ZF1025', 'ZF434', 'ZF469', 'ZF659']

解决方案 2

如果您不介意遍历行(通常不鼓励使用 pandas),您可以这样做

df[df.apply(lambda row: row['node'] in row['key_players'], axis=1)]

具有相同的输出


推荐阅读