首页 > 解决方案 > df.loc 替换数据框中的逗号分隔数字

问题描述

我从这里下载了数据框:https ://ods.od.nih.gov/HealthInformation/Dietary_Reference_Intakes.aspx

使用 BeautifulSoup,但一些数值有千位分隔符和“星号”,我想把它们都去掉。我有正则表达式来取出“星号”,但尝试在逗号上使用 str.replace(",", ""),然后使用 .loc 插入新字符串。我的代码:

#iterate each df field and if comma sep, replace
for name,df in df_dict.items():
    print(name, df.dtypes)
    cols = list(df.columns)
    #print(cols)
    for idx, row in df.iterrows():
        # skip lifestage group col
        for i in range(1,len(cols)):
            curr_val = str(row[cols[i]])
            print(f'curr_val: {type(curr_val),curr_val}')
            print(f'row[0]:{row[cols[0]]}')
            if "," in curr_val:
                clean_val = curr_val.replace(",", "")
                print(f'comma: {df.loc[row[cols[0]], cols[i]]}')
                df.loc[row[cols[0]],cols[i]] = clean_val
                print(f'no comma: {df.loc[row[cols[0]], cols[i]]}\n')
            

df.dtypes 显示

Life-Stage Group     object
Calcium (mg/d)       object
Chromium (μg/d)      object
Copper (μg/d)        object
Fluoride (mg/d)      object
Iodine (μg/d)        object
Iron (mg/d)          object
Magnesium (mg/d)     object
Manganese (mg/d)     object
Molybdenum (μg/d)    object
Phosphorus (mg/d)    object
Selenium (μg/d)      object
Zinc (mg/d)          object
Potassium (mg/d)     object
Sodium (mg/d)        object
Chloride (g/d)       object
dtype: object

所以我认为它应该可以工作,但实际上没有发生任何变化。

理想情况下,我想同时使用逗号和“*”,只保留 int 或 float 值。

标签: pythonpandas

解决方案


@piterbarg 的回答是正确的。对此进行了编辑,并且可以正常工作:

#iterate each df field and if comma sep, replace
for name,df in df_dict.items():
    str_df = df.copy().astype(str)
    cols = list(df.columns)
    print(f'cols[0]: {cols[0]}')
    
    # skip lifestage group col
    for i in range(1,len(cols)):
        str_df[cols[i]] = str_df[cols[i]].str.replace(',', '').str.replace('*','')


    df_dict[name] = str_df

推荐阅读