首页 > 解决方案 > 为什么无法在 execl_df 中获得更改 na_values 的正确数据帧

问题描述

当我在 jupyter 中运行此代码时,na_values 在 csv_df 中完全更改,但在收入列 -1 中的 excel_df 中未更改为 NaN 为什么?

https://github.com/codebasics/py/blob/master/pandas/4_read_write_to_excel/stock_data.csv https://github.com/codebasics/py/blob/master/pandas/4_read_write_to_excel/stock_data.xlsx

import pandas as pd

dict = {'eps':['not available', 'n.a.'],
        'people':['not available','n.a.'],
        'revenue':['not available', 'n.a.',-1],
        'price':['not available', 'n.a.']
       }
csv_df = pd.read_csv('stock_data.csv', na_values = dict)
csv_df

excel_df = pd.read_excel('stock_data.xlsx', na_values = dict)
excel_df

标签: pythonpandasdata-science

解决方案


不确定发生这种情况的确切原因,但您可以将收入列 dtype 指定为对象(字符串),然后转换回数字作为解决方法。此外,在本例中,我将您的 dict 变量名称更改为 the_dict。不建议命名变量 dict 或 list。

import pandas as pd

dtype =  {'revenue':'object'}

the_dict = {'eps':['not available', 'n.a.'],
        'people':['not available','n.a.'],
        'revenue':['not available', 'n.a.',-1 ],
        'price':['not available', 'n.a.']
       }

excel_df = pd.read_excel('stock_data.xlsx', dtype=dtype, na_values=the_dict)

excel_df['revenue'] = pd.to_numeric(excel_df['revenue'])

print(excel_df)

    tickers eps     revenue price   people
0   GOOGL   27.82   87.0    845.0   larry page
1   WMT     4.61    484.0   65.0    NaN
2   MSFT    -1.00   85.0    64.0    bill gates
3   RIL     NaN     50.0    1023.0  mukesh ambani
4   TATA    5.60    NaN     NaN     ratan tata

推荐阅读