首页 > 解决方案 > 使用 Python 在数据框列中设置标志和消息

问题描述

我有 df 喜欢:

      PAN        
0   ACBDV8521N  
1   
2     NaN
3    AWVFEF 

我想要 df 像这样

       PAN           PAN_Status          Invalid
0   ACBDV8521N     Valid PAN Number       False
1                  PAN is not present     True 
2     NaN          PAN is not present     True 
3    AWVFEF        Not Valid PAN          False 

我尝试:

def panValidation(ele):
    if (ele.strip() =='') or pd.isna(ele):
        df['invalid'] = True
        return (True,"PAN is not present")
    elif re.match(r'^[A-Z]{5}[0-9]{4}[A-Z]$',ele):
        return "Valid PAN number"
    else:
        return "Not Valid PAN"

但我也想使用函数True/False返回标志,如果 PAN 编号为空白/空,则在无效列中设置为真,否则为假

标签: pythonpandasfunction

解决方案


最简单的是在下一步创建新列:

df['Invalid'] = df['PAN_Status'] == 'PAN is not present'

如果需要在函数使用中返回元组:

def panValidation(ele):
    if pd.isna(ele) or (ele.strip() ==''):
        return ("PAN is not present", True)
    elif re.match(r'^[A-Z]{5}[0-9]{4}[A-Z]$',ele):
        return ("Valid PAN number", False)
    else:
        return ("Not Valid PAN", False)
    
df[['PAN_Status', 'Invalid']] = df['PAN'].apply(panValidation).tolist()
print (df)
          PAN          PAN_Status  Invalid
0  ACBDV8521N    Valid PAN number    False
1         NaN  PAN is not present     True
2         NaN  PAN is not present     True
3      AWVFEF       Not Valid PAN    False

推荐阅读