首页 > 解决方案 > 在进行数据规范化时,我总是得到 ValueError: cannot convert float NaN to integer

问题描述

我正在尝试通过使用此代码进行十进制缩放来规范化我的 CSV 数据

def decimal_scaling(data):
    data = np.array(data, dtype=np.float32)
    max_row = data.max(axis=0)
    c = np.array([len(str(int(number))) for number in np.abs(max_row)])
    return data/(10**c)

X = decimal_scaling(
            glcm_df[['dissimilarity_0', 'dissimilarity_45', 'dissimilarity_90', 'dissimilarity_135', 
                     'correlation_0', 'correlation_45', 'correlation_90', 'correlation_135', 
                     'homogeneity_0', 'homogeneity_45', 'homogeneity_90', 'homogeneity_135', 
                     'contrast_0', 'contrast_45', 'contrast_90', 'contrast_135', 
                     'ASM_0', 'ASM_45', 'ASM_90', 'ASM_135',
                     'energy_0', 'energy_45', 'energy_90', 'energy_135']].values)

但是,每次我运行它,我总是得到这个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-5b1233475b8c> in <module>
     22                      'contrast_0', 'contrast_45', 'contrast_90', 'contrast_135',
     23                      'ASM_0', 'ASM_45', 'ASM_90', 'ASM_135',
---> 24                      'energy_0', 'energy_45', 'energy_90', 'energy_135']].values)

<ipython-input-21-5b1233475b8c> in decimal_scaling(data)
     13     data = np.array(data, dtype=np.float32)
     14     max_row = data.max(axis=0)
---> 15     c = np.array([len(str(int(number))) for number in np.abs(max_row)])
     16     return data/(10**c)
     17 

<ipython-input-21-5b1233475b8c> in <listcomp>(.0)
     13     data = np.array(data, dtype=np.float32)
     14     max_row = data.max(axis=0)
---> 15     c = np.array([len(str(int(number))) for number in np.abs(max_row)])
     16     return data/(10**c)
     17 

ValueError: cannot convert float NaN to integer

我不确定出了什么问题。

标签: pythonnumpy

解决方案


Numpy floats allow NaN values but ints don't. So the NaN propagates through your float calculations until it hits the int conversion.

That is, you are reading the data which results in some NaN values, then max returns a NaN for these rows, and the same for abs also return NaN, then int() complains.

Try:

data = np.array(data, dtype=np.float32) # from your code
print(np.argwhere(np.isnan(data)))

to find where your NaN values are.


推荐阅读