首页 > 解决方案 > 如何在熊猫数据框中选择不存在条件的记录

问题描述

我有两个数据框,如下所示。我想将数据选择 SQL 查询重写为包含不存在条件的 pandas

SQL

Select ORDER_NUM, DRIVER  FROM DF
                          WHERE
                          1=1
                        AND NOT EXISTS   
 (
        SELECT 1 FROM
                 order_addition oa
                 WHERE
                  oa.Flag_Value = 'Y'
            AND   df.ORDER_NUM  = oa.ORDER_NUM)

样本数据

order_addition.head(10)

ORDER_NUM   Flag_Value
22574536    Y
32459745    Y
15642314    Y
12478965    N
25845673    N
36789156    N

df.head(10)

ORDER_NUM   REGION  DRIVER
22574536    WEST    Ravi
32459745    WEST    David
15642314    SOUTH   Rahul
12478965    NORTH   David
25845673    SOUTH   Mani
36789156    SOUTH   Tim

如何在熊猫中轻松做到这一点。

标签: python-3.xpandasdataframesubquerynot-exists

解决方案


IIUC,您可以merge使用df1等于 Y 的值,然后找到 nans:

result = df2.merge(df1[df1["Flag_Value"].eq("Y")],how="left",on="ORDER_NUM")

print (result[result["Flag_Value"].isnull()])

   ORDER_NUM REGION DRIVER Flag_Value
3   12478965  NORTH  David        NaN
4   25845673  SOUTH   Mani        NaN
5   36789156  SOUTH    Tim        NaN

ORDER_NUM如果您是独一无二的,甚至更简单:

print (df2.loc[~df2["ORDER_NUM"].isin(df1.loc[df1["Flag_Value"].eq("Y"),"ORDER_NUM"])])

   ORDER_NUM REGION DRIVER
3   12478965  NORTH  David
4   25845673  SOUTH   Mani
5   36789156  SOUTH    Tim

推荐阅读