首页 > 解决方案 > Python Pandas:返回与另一个数据帧中的两列匹配的数据帧中的所有值

问题描述

我有一个看起来像这样的数据框(10k~ 行)。我称它为 Maindf

+---+---------+----------+-------+--------------+
|   | Product | Discount | Store | OtherColumns |
+---+---------+----------+-------+--------------+
| 0 | A       | 0.5      | Red   |              |
| 1 | A       | 1        | Red   |              |
| 2 | C       | 3        | Green |              |
| 3 | Z       | 1.5      | Blue  |              |
| 4 | I       | 0        | Red   |              |
| 5 | D       | 0        | Green |              |
+---+---------+----------+-------+--------------+

通过代码,我生成了这个其他数据帧(根据输入数据而变化)。我称它为 Filterdf

+---+---------+----------+---------+
|   | Product | Discount | Counter |
+---+---------+----------+---------+
| 0 | A       | 0.5      |       1 |
| 1 | B       | 2.0      |       2 |
| 2 | C       | 1        |       9 |
| 3 | D       | 0        |       7 |
+---+---------+----------+---------+

我正在尝试从 Maindf 返回与 Filterdf 列 Product 和 Discount 匹配的所有值。所以预期的输出是这样的

+---+---------+----------+-------+--------------+
|   | Product | Discount | Store | OtherColumns |
+---+---------+----------+-------+--------------+
| 0 | A       | 0.5      | Red   |              |
| 1 | D       | 0        | Green |              |
+---+---------+----------+-------+--------------+

这是我的代码行,但运行不正常。

NewMaindf = Maindf[(Maindf['Product'].isin(Filterdf['Product']) & Maindf['Discount'].isin(Filterdf['Discount']))]

print(NewMaindf)

输出是这样的。我只对与 Filterdf 的两列匹配的 Maindf 中的数据感兴趣,在这种情况下,折扣为 1 的 A 正在通过,因为 A isin Filterdf['Product'] 和 1 isin Filterdf['Discount'] 但与 Product C

+---+---------+----------+-------+--------------+
|   | Product | Discount | Store | OtherColumns |
+---+---------+----------+-------+--------------+
| 0 | A       | 0.5      | Red   |              |
| 1 | A       | 1        | Red   |              |
| 2 | D       | 0        | Green |              |
+---+---------+----------+-------+--------------+

这怎么可能实现?谢谢你,抱歉格式不好,第一次在这里发帖

标签: pythonpandasdataframefiltermatch

解决方案


import pandas as pd
maindf = {'Product': ['A', 'A','C','Z','I','D'], 'Discount': [0.5,1,3,1.5,0,0],'Store' :['Red','Red','Red','Red','Red','Red']}
Maindf = pd.DataFrame(data=maindf)
print(Maindf)
filterdf = {'Product': ['A', 'B','C','D' ], 'Discount': [0.5, 2.0,1,0]}
Filterdf = pd.DataFrame(data=filterdf)
print(Filterdf)
NewMaindf= Maindf[Maindf[['Product','Discount']].astype(str).sum(axis = 1).isin(
                Filterdf[['Product','Discount']].astype(str).sum(axis = 1))]
print(NewMaindf)

输出:

 Product  Discount Store
0       A       0.5   Red
1       A       1.0   Red
2       C       3.0   Red
3       Z       1.5   Red
4       I       0.0   Red
5       D       0.0   Red
  Product  Discount
0       A       0.5
1       B       2.0
2       C       1.0
3       D       0.0
  Product  Discount Store
0       A       0.5   Red
5       D       0.0   Red

推荐阅读