首页 > 解决方案 > 如何从熊猫的另一列中不包含的一列中获取ID列表

问题描述

熊猫数据框有两列,其中包含需要获取 id 的 id 列表,而不包含在其他列中

id  Column_1    Column_2
1   [1,2,5,7,9] [1,2,5,7]
2   [4,8,2,7]   [4,8,2,7]
3   [5,7,2,9]   [9] 
4   [4,7,2,9]   [3]


I want to  result like
id  Column_1    Column_2    result
1   [1,2,7]     [1,2,5,7,9] [5,9]
2   [4,8,2,7]   [4,8,2,7]   []
3   [5,7,2,9]   [9]         []
4   [4,7,2,9]   [3]         [3]

标签: pythonpandaslistdataframedata-analysis

解决方案


将值转换为集合并获得差异:

df['Column_3'] = [list(set(y).difference(x)) for x, y in zip(df['Column_1'], df['Column_2'])]
print (df)
   id      Column_1         Column_2 Column_3
0   1     [1, 2, 7]  [1, 2, 5, 7, 9]   [9, 5]
1   2  [4, 8, 2, 7]     [4, 8, 2, 7]       []
2   3  [5, 7, 2, 9]              [9]       []
3   4  [4, 7, 2, 9]              [3]      [3]

推荐阅读