python-3.x - 比较列行列表并在熊猫中使用过滤器
问题描述
sales = [(3588, [1,2,3,4,5,6], [1,38,9,2,18,5]),
(3588, [2,5,7], [1,2,4,8,14]),
(3588, [3,10,13], [1,3,4,6,12]),
(3588, [4,5,61], [1,2,3,4,11,5]),
(3590, [3,5,6,1,21], [3,10,13]),
(3590, [8,1,2,4,6,9], [2,5,7]),
(3591, [1,2,4,5,13], [1,2,3,4,5,6])
]
labels = ['goods_id', 'properties_id_x', 'properties_id_y']
df = pd.DataFrame.from_records(sales, columns=labels)
df
Out[4]:
goods_id properties_id_x properties_id_y
0 3588 [1, 2, 3, 4, 5, 6] [1, 38, 9, 2, 18, 5]
1 3588 [2, 5, 7] [1, 2, 4, 8, 14]
2 3588 [3, 10, 13] [1, 3, 4, 6, 12]
3 3588 [4, 5, 61] [1, 2, 3, 4, 11, 5]
4 3590 [3, 5, 6, 1, 21] [3, 10, 13]
5 3590 [8, 1, 2, 4, 6, 9] [2, 5, 7]
6 3591 [1, 2, 4, 5, 13] [1, 2, 3, 4, 5, 6]
拥有商品及其属性的df。需要逐行比较商品properties_id_x和properties_id_y"1"
,只返回列表中同时包含和"5"
的那些行。无法弄清楚如何做到这一点。
期望的输出:
0 3588 [1, 2, 3, 4, 5, 6] [1, 38, 9, 2, 18, 5]
6 3591 [1, 2, 4, 5, 13] [1, 2, 3, 4, 5, 6]
解决方案
选项1:
In [176]: mask = df.apply(lambda r: {1,5} <= (set(r['properties_id_x']) & set(r['properties_id_y'])), axis=1)
In [177]: mask
Out[177]:
0 True
1 False
2 False
3 False
4 False
5 False
6 True
dtype: bool
In [178]: df[mask]
Out[178]:
goods_id properties_id_x properties_id_y
0 3588 [1, 2, 3, 4, 5, 6] [1, 38, 9, 2, 18, 5]
6 3591 [1, 2, 4, 5, 13] [1, 2, 3, 4, 5, 6]
选项 2:
In [183]: mask = df.properties_id_x.map(lambda x: {1,5} <= set(x)) & df.properties_id_y.map(lambda x: {1,5} <= set(x))
In [184]: df[mask]
Out[184]:
goods_id properties_id_x properties_id_y
0 3588 [1, 2, 3, 4, 5, 6] [1, 38, 9, 2, 18, 5]
6 3591 [1, 2, 4, 5, 13] [1, 2, 3, 4, 5, 6]
推荐阅读
- javascript - 如何同时创建和引用属性。构造函数中的关键字?(JavaScript)
- javascript - 如何将用户输入存储为打字稿中的对象键?
- javascript - 无法遍历 Javascript 中的第二个元素
- python - Python:捕获 logging.exception() 调用
- assembly - 为什么每次迭代的微指令数会随着流加载的步幅而增加?
- r - 在每组的中心找到回归线的置信区间
- bash - 将csv解析成变量并循环执行ffmpeg命令
- python - 为什么余弦相似度应该用于词向量?
- javascript - Lambda 时刻 UTC 时间设置小时和分钟
- c - 不确定为什么冒泡排序代码有分段错误