首页 > 解决方案 > Pandas 在数据框中将不匹配的列表项标记为缺失

问题描述

search_list即使在 dataFrame 列中找不到,我也想打印 List 中的所有服务器名称df['Server Name']只需将不匹配的值标记为缺失。

这只是POST Here的扩展

Server-Inventory.csv部分中的服务器名称Server Name如下。

kpc2021
kpc8291
kpc8471 
kpc8474 
kpc1178
kpc1179
kpc1168
kpc8812 
kpc8813
kpc8814

代码

import pandas as pd
search_list =  ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]
cols = [ 'Server', 'Server Name', 'iLO FW', 'Firmware', 'Appliance Name']

df = pd.read_csv("Server-Inventory.csv", usecols=cols)
df = df[df['Server Name'].astype(str).str.contains('|'.join(search_list))]
print(df)

数据框:

           Server                    Server Name            iLO FW                Firmware         Appliance Name
0  ENC2002, bay 10                      kpc2021   2.50 Sep 23 2016  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
1  ENC8023, bay 7                kpc8291.db.com   2.40 Dec 02 2015  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
2  enc8009, bay 12                kpc8471.db.com  2.61 Jul 27 2018  I42 v1.42 (06/20/2020)  OV C7000 enclosures 1
3  enc1011, bay 1                        kpc8479  2.55 Aug 16 2017  I36 v2.74 (10/21/2019)  OV C7000 enclosures 1
4  enc1014, bay 1                        kpc1168  2.70 May 07 2019  I36 v2.74 (11/13/2019)  OV C7000 enclosures 1

结果:

               Server Server Name            iLO FW                Firmware         Appliance Name
  0   ENC2002, bay 10     kpc2021  2.55 Aug 16 2017  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
  1    enc8023, bay 7     kpc8291  2.55 Aug 16 2017  I36 v2.52 (10/25/2020)  OV C7000 enclosures 2
  2  enc8009, bay 12      kpc8471  1.30 May 31 2018  I42 v1.42 (06/20/2020)                 ov7003

期望:

               Server Server Name            iLO FW                Firmware         Appliance Name
  0   ENC2002, bay 10     kpc2021  2.55 Aug 16 2017  I36 v2.52 (10/25/2020)  OV C7000 enclosures 1
  1    enc8023, bay 7     kpc8291  2.55 Aug 16 2017  I36 v2.52 (10/25/2020)  OV C7000 enclosures 2
  2  enc8009, bay 12      kpc8471  1.30 May 31 2018  I42 v1.42 (06/20/2020)                 ov7003
  4  missing              kpc8472  missing           missing                  missing
  5  missing              kpc1165  missing           missing                  missing

笔记 :

如果匹配或不匹配,则仅打印列表中的值,而不是所有不在的 CVS 值search_list

标签: pythonpython-3.xpandas

解决方案


用于Series.str.extract帮助列匹配的名称,然后通过以下方式获取缺失的名称numpy.setdiff1d并添加到过滤的 DataFrame 中DataFrame.append

search_list =  ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]

pat = '(' + '|'.join(search_list) + ')' 
df['Server1'] = df['Server Name'].str.extract(pat)

df1 = pd.DataFrame({'Server1': np.setdiff1d(search_list, df['Server1'].dropna())})
df = df[df['Server1'].notna()].append(df1, ignore_index=True)

print (df)
            Server     Server Name            iLO FW                Firmware  \
0  ENC2002, bay 10         kpc2021  2.50 Sep 23 2016  I36 v2.52 (10/25/2020)   
1   ENC8023, bay 7  kpc8291.db.com  2.40 Dec 02 2015  I36 v2.52 (10/25/2020)   
2  enc8009, bay 12  kpc8471.db.com  2.61 Jul 27 2018  I42 v1.42 (06/20/2020)   
3              NaN             NaN               NaN                     NaN   
4              NaN             NaN               NaN                     NaN   

          Appliance Name  Server1  
0  OV C7000 enclosures 1  kpc2021  
1  OV C7000 enclosures 1  kpc8291  
2  OV C7000 enclosures 1  kpc8471  
3                    NaN  kpc1165  
4                    NaN  kpc8472  

如果需要用某个值替换缺失值:

search_list =  ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]

pat = '(' + '|'.join(search_list) + ')' 
df['Server1'] = df['Server Name'].str.extract(pat)

df1 = pd.DataFrame({'Server1': np.setdiff1d(search_list, df['Server1'].dropna())})
df1 = df1.reindex(df.columns, fill_value='missing',axis=1)
df = df[df['Server1'].notna()].append(df1, ignore_index=True)

print (df)
            Server     Server Name            iLO FW                Firmware  \
0  ENC2002, bay 10         kpc2021  2.50 Sep 23 2016  I36 v2.52 (10/25/2020)   
1   ENC8023, bay 7  kpc8291.db.com  2.40 Dec 02 2015  I36 v2.52 (10/25/2020)   
2  enc8009, bay 12  kpc8471.db.com  2.61 Jul 27 2018  I42 v1.42 (06/20/2020)   
3          missing         missing           missing                 missing   
4          missing         missing           missing                 missing   

          Appliance Name  Server1  
0  OV C7000 enclosures 1  kpc2021  
1  OV C7000 enclosures 1  kpc8291  
2  OV C7000 enclosures 1  kpc8471  
3                missing  kpc1165  
4                missing  kpc8472 

推荐阅读