首页 > 解决方案 > Python Pandas - 使用 .loc 在多列上使用 AND 和 OR 进行选择

问题描述

我有一种情况,我试图一次从数据框中选择一些场景。以下代码是我目前正在使用的代码:

dfWater1 = left_merged.loc[left_merged.BVG_2M.isin(['34']) and left_merged.VHC_SC.isin(['6. Nil veg']) and left_merged.wetland.isin(['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).']) | left_merged.RE.isin(['water', 'reef', 'ocean', 'estuary', 'canal'])].copy()

或者,使用一些额外的括号来包含 AND 并分隔 OR:

dfWater1 = left_merged.loc[(left_merged.BVG_2M.isin(['34']) and left_merged.VHC_SC.isin(['6. Nil veg']) and left_merged.wetland.isin(['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).'])) | (left_merged.RE.isin(['water', 'reef', 'ocean', 'estuary', 'canal']))].copy()

基本上,我要求在以下位置选择行:

   (
      Column BVG_2M = 34
         AND
      Column VHC_SC = '6. Nil veg'
         AND
      Column wetland is one of the following ['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).']
   )
OR
   (
      Column RE is one of the following ['water', 'reef', 'ocean', 'estuary', 'canal']
   )

数据集非常大,所以我想尝试保持快速选择(因此使用 .loc 并以矢量化方式处理它),并尽量避免创建比保留内存所需的更多数据帧,如果可能的话。

我认为我真正的问题是我不确定如何构建 .loc 语句,或者即使我可以这样做。

错误信息

File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\generic.py", line 1479, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

标签: pythonpython-3.xpandasdataframe

解决方案


您应该使用&而不是and在每个条件周围加上括号。在新行上格式化所有内容也有助于防止括号错误:

dfWater1 = left_merged.loc[((left_merged.BVG_2M.isin(['34'])) &
                            (left_merged.VHC_SC.isin(['6. Nil veg'])) &
                            (left_merged.wetland.isin(['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).']))) 
                          | (left_merged.RE.isin(['water', 'reef', 'ocean', 'estuary', 'canal']))].copy()

推荐阅读