python - Pandas 在数据框中将不匹配的列表项标记为缺失
问题描述
search_list
即使在 dataFrame 列中找不到,我也想打印 List 中的所有服务器名称df['Server Name']
只需将不匹配的值标记为缺失。
这只是POST Here的扩展
Server-Inventory.csv
部分中的服务器名称Server Name
如下。
kpc2021
kpc8291
kpc8471
kpc8474
kpc1178
kpc1179
kpc1168
kpc8812
kpc8813
kpc8814
代码
import pandas as pd
search_list = ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]
cols = [ 'Server', 'Server Name', 'iLO FW', 'Firmware', 'Appliance Name']
df = pd.read_csv("Server-Inventory.csv", usecols=cols)
df = df[df['Server Name'].astype(str).str.contains('|'.join(search_list))]
print(df)
数据框:
Server Server Name iLO FW Firmware Appliance Name
0 ENC2002, bay 10 kpc2021 2.50 Sep 23 2016 I36 v2.52 (10/25/2020) OV C7000 enclosures 1
1 ENC8023, bay 7 kpc8291.db.com 2.40 Dec 02 2015 I36 v2.52 (10/25/2020) OV C7000 enclosures 1
2 enc8009, bay 12 kpc8471.db.com 2.61 Jul 27 2018 I42 v1.42 (06/20/2020) OV C7000 enclosures 1
3 enc1011, bay 1 kpc8479 2.55 Aug 16 2017 I36 v2.74 (10/21/2019) OV C7000 enclosures 1
4 enc1014, bay 1 kpc1168 2.70 May 07 2019 I36 v2.74 (11/13/2019) OV C7000 enclosures 1
结果:
Server Server Name iLO FW Firmware Appliance Name
0 ENC2002, bay 10 kpc2021 2.55 Aug 16 2017 I36 v2.52 (10/25/2020) OV C7000 enclosures 1
1 enc8023, bay 7 kpc8291 2.55 Aug 16 2017 I36 v2.52 (10/25/2020) OV C7000 enclosures 2
2 enc8009, bay 12 kpc8471 1.30 May 31 2018 I42 v1.42 (06/20/2020) ov7003
期望:
Server Server Name iLO FW Firmware Appliance Name
0 ENC2002, bay 10 kpc2021 2.55 Aug 16 2017 I36 v2.52 (10/25/2020) OV C7000 enclosures 1
1 enc8023, bay 7 kpc8291 2.55 Aug 16 2017 I36 v2.52 (10/25/2020) OV C7000 enclosures 2
2 enc8009, bay 12 kpc8471 1.30 May 31 2018 I42 v1.42 (06/20/2020) ov7003
4 missing kpc8472 missing missing missing
5 missing kpc1165 missing missing missing
笔记 :
如果匹配或不匹配,则仅打印列表中的值,而不是所有不在的 CVS 值search_list
。
解决方案
用于Series.str.extract
帮助列匹配的名称,然后通过以下方式获取缺失的名称numpy.setdiff1d
并添加到过滤的 DataFrame 中DataFrame.append
:
search_list = ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]
pat = '(' + '|'.join(search_list) + ')'
df['Server1'] = df['Server Name'].str.extract(pat)
df1 = pd.DataFrame({'Server1': np.setdiff1d(search_list, df['Server1'].dropna())})
df = df[df['Server1'].notna()].append(df1, ignore_index=True)
print (df)
Server Server Name iLO FW Firmware \
0 ENC2002, bay 10 kpc2021 2.50 Sep 23 2016 I36 v2.52 (10/25/2020)
1 ENC8023, bay 7 kpc8291.db.com 2.40 Dec 02 2015 I36 v2.52 (10/25/2020)
2 enc8009, bay 12 kpc8471.db.com 2.61 Jul 27 2018 I42 v1.42 (06/20/2020)
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
Appliance Name Server1
0 OV C7000 enclosures 1 kpc2021
1 OV C7000 enclosures 1 kpc8291
2 OV C7000 enclosures 1 kpc8471
3 NaN kpc1165
4 NaN kpc8472
如果需要用某个值替换缺失值:
search_list = ["kpc2021","kpc8291","kpc8471", "kpc8472", "kpc1165"]
pat = '(' + '|'.join(search_list) + ')'
df['Server1'] = df['Server Name'].str.extract(pat)
df1 = pd.DataFrame({'Server1': np.setdiff1d(search_list, df['Server1'].dropna())})
df1 = df1.reindex(df.columns, fill_value='missing',axis=1)
df = df[df['Server1'].notna()].append(df1, ignore_index=True)
print (df)
Server Server Name iLO FW Firmware \
0 ENC2002, bay 10 kpc2021 2.50 Sep 23 2016 I36 v2.52 (10/25/2020)
1 ENC8023, bay 7 kpc8291.db.com 2.40 Dec 02 2015 I36 v2.52 (10/25/2020)
2 enc8009, bay 12 kpc8471.db.com 2.61 Jul 27 2018 I42 v1.42 (06/20/2020)
3 missing missing missing missing
4 missing missing missing missing
Appliance Name Server1
0 OV C7000 enclosures 1 kpc2021
1 OV C7000 enclosures 1 kpc8291
2 OV C7000 enclosures 1 kpc8471
3 missing kpc1165
4 missing kpc8472