首页 > 解决方案 > 检查数据框列是否已填充并按字符串搜索

问题描述

我有以下数据框:

      import pandas as pd
      import re

      df = pd.DataFrame({'Column_01': ['Press', 'Temp', '', 'Strain gauge', 'Ultrassonic', ''], 
                         'Column_02': ['five', 'two', 'five', 'five', 'three', 'three']})

我首先要检查“Column_01”是否已填充。如果填充了“Columns_01”或“Column_02”包含单词“one”、“two”、“three”。新列(分类器)将收到“传感器”。

为了识别“Column_02”字符串,我实现了以下代码:

     df['Classifier'] = df.apply(lambda x: 'SENSOR'
                        if re.search(r'one|two|three', x['Column_02'])
                        else 'Nan', axis = 1)

此代码正在运行。它完美地找到了数据框行上的字符串。但是,我还需要检查“Column_01”是否已填充。我无法使用函数 notnull() 来解决问题。

我希望输出为:

      Column_01      Column_02  Classifier
         Press         five        SENSOR        #current line of Column_01 completed
         Temp           two        SENSOR        #current line of Column_02 completed; string 'two'
                        five        Nan                    
    Strain gauge        five       SENSOR        #current line of Column_01 completed
     Ultrassonic        three      SENSOR        #current line of Column_01 completed; string 'three' 
                        three      SENSOR        #string 'three'

标签: pythondataframenotnull

解决方案


通常你应该避免.apply()(参考 https://stackoverflow.com/a/54432584/11610186)。

这应该可以解决问题:

import numpy as np

df["Classifier"]=np.where(df["Column_01"].fillna('').ne('')|df["Column_02"].str.contains("(one)|(two)|(three)"), "SENSOR", np.nan)

输出:

      Column_01 Column_02 Classifier
0         Press      five     SENSOR
1          Temp       two     SENSOR
2                    five        nan
3  Strain gauge      five     SENSOR
4   Ultrassonic     three     SENSOR
5                   three     SENSOR

推荐阅读