首页 > 解决方案 > Python dataframe; trouble changing value of column with multiple filters

问题描述

I have a large dataframe I took off an ODBC database. The Dataframe has multiple columns; I'm trying to change the values of one column by filtering two other. First, I filter my dataframe data_prem with both conditions which gives me the correct rows:

data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))]

Then I use the replace function on the selection to change 'M' value to 'H' value:

data_prem[(data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))]['Reinsurer'].replace(to_replace='M',value='H',inplace=True,regex=True)

Python warns me I'm trying to modify a copy of the dataframe, even though I'm clearly refering to the original dataframe (I'm posting image so you can see my results).

dataframe filtering

I also tried using .loc function in the following manner:

data_prem.loc[((data_prem['PRODUCT_NAME']=='ŽZ08') & (data_prem['BENEFIT'].str.contains('19.08.16'))),'Reinsurer'] = 'H'

which changed all rows that fit the second condition (str.contains...), but it didn't apply the first condition. I got replacements in the 'Reinsurer' column for other 'PRODUCT_NAME' values as well.

I've been scouring the web for an answer to this for some time. I've seen some mentions of a bug in the pandas library, not sure if this is what they were talking about.

I would value any opinions you might have, would also be interesting in alternative ways to solving this problem. I filled the 'Reinsurer' column with the map function with 'PRODUCT_NAME' as the input (had a dictionary that connected all 'PRODUCT_NAME' values with 'Reinsurer' values).

标签: pythonpandasdataframefilter

解决方案


给定您的 Boolean mask,您已经演示了两种应用链式索引的方法。这是警告的原因,也是您没有看到您的逻辑按预期应用的原因。

mask = (data_prem['PRODUCT_NAME']=='ŽZ08') & df['BENEFIT'].str.contains('19.08.16')

链式索引:示例 #1

df[mask]['Reinsurer'].replace(to_replace='M', value='H', inplace=True, regex=True)

链式索引:示例 #2

df[mask].loc[mask, 'Reinsurer'] = 'H'

避免链式索引

您可以通过应用mask一次并使用单个loc调用来保持简单:

df.loc[mask, 'Reinsurer'] = 'H'

推荐阅读