首页 > 解决方案 > 在 pandas 列中替换多个术语的优雅而有效的方法

问题描述

我想替换数据框列中的多个值,如下所示

df['label'] = ['Sodium', 'Bicarbonate', 'White Blood Cells', 'Hemoglobin',
       'Glucose', 'Lactate', 'pH', 'Potassium, Whole Blood',
       'Sodium, Whole Blood', 'Lactate Dehydrogenase (LD)',
       'Bilirubin, Direct', 'Alkaline Phosphatase',
       'Alanine Aminotransferase (ALT)',
       'Asparate Aminotransferase (AST)', 'Potassium', 'Phosphate',
       'Creatinine', 'C-Reactive Protein', 'pCO2',
       'Calculated Bicarbonate, Whole Blood', 'Bilirubin, Total',
       'Albumin', 'Bilirubin, Indirect', 'Urine Volume', 'WBC Count',
       'Urine Volume, Total', 'Phosphate, Body Fluid']

Sodium在下面的代码中,我试图Sodium, Whole BloodSodium.

同样,我对其余的测量也做同样的事情

df['label'] = df['label'].replace(dict.fromkeys(['Sodium','Sodium, Whole Blood'], 'Sodium'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Bilirubin, Indirect'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bilirubin, Direct','Bilirubin, Total','Calculated Bicarbonate, Whole Blood'], 'Bilirubin'))
df['label'] = df['label'].replace(dict.fromkeys(['Urine Volume, Total','Urine Volume'], 'Urine Volume'))
df['label'] = df['label'].replace(dict.fromkeys(['White Blood Cells','WBC Count'], 'WBC'))
df['label'] = df['label'].replace(dict.fromkeys(['Potassium, Whole Blood','Potassium'], 'Potassium'))
df['label'] = df['label'].replace(dict.fromkeys(['Phosphate','Phosphate, Body Fluid'], 'Phosphate'))

虽然上面的代码工作得很好,但有没有其他有效的方法可以有效地替换而不是多次重复同一行代码?

标签: pythonpandasdataframedictionaryseries

解决方案


一种方法是创建大字典并替换一次:

# add more of your stuff here
lst = [(['Sodium','Sodium, Whole Blood'], 'Sodium'),
       (['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate')
      ]

repl_dict = {}
for x,y in lst:
    repl_dict.update(dict.fromkeys(x,y))

df['label'] = df['label'].replace(repl_dict)

推荐阅读