首页 > 解决方案 > Correlation heatmap turned values into nan in Python

问题描述

I want to conduct a heatmap on my table df, which looks normal at the beginning:

    Total   Paid Post Engaged   Negative    like 
1   2178    0    0    66        0           1207
2   1042    0    0    60        0           921
3   2096    0    0    112       0           1744
4   1832    0    0    109       0           1718
5   1341    0    0    38        0           889
6   1933    0    0    123       0           1501
    ...

but after I applied:

df= full_Data.iloc[1:,4:10]
df= pd.DataFrame(df,columns=['A','B','C', 'D', 'E', 'F'])

corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()

it returned an empty graph:

C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:204: RuntimeWarning: All-NaN slice encountered
  vmin = np.nanmin(calc_data)
C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:209: RuntimeWarning: All-NaN slice encountered
  vmax = np.nanmax(calc_data)

enter image description here

and df returned:

    A   B   C   D   E   F
1   nan nan nan nan nan nan
2   nan nan nan nan nan nan
3   nan nan nan nan nan nan
4   nan nan nan nan nan nan
5   nan nan nan nan nan nan
    ...

Why all the values are turned into nan?


Update:

Tried to convert df without naming column in the old way:

df.columns = ['A','B','C', 'D', 'E', 'F']

and

df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])

and both caught error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-3a27f095066b> in <module>
     12 
     13 corrMatrix = df.corr()
---> 14 sn.heatmap(corrMatrix, annot=True)
     15 plt.show()
     16 

~\Anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~\Anaconda3\lib\site-packages\seaborn\matrix.py in heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, xticklabels, yticklabels, mask, ax, **kwargs)
    545     plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
    546                           annot_kws, cbar, cbar_kws, xticklabels,
--> 547                           yticklabels, mask)
    548 
    549     # Add the pcolormesh kwargs here

~\Anaconda3\lib\site-packages\seaborn\matrix.py in __init__(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask)
    164         # Determine good default values for the colormapping
    165         self._determine_cmap_params(plot_data, vmin, vmax,
--> 166                                     cmap, center, robust)
    167 
    168         # Sort out the annotations

~\Anaconda3\lib\site-packages\seaborn\matrix.py in _determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust)
    202                 vmin = np.nanpercentile(calc_data, 2)
    203             else:
--> 204                 vmin = np.nanmin(calc_data)
    205         if vmax is None:
    206             if robust:

<__array_function__ internals> in nanmin(*args, **kwargs)

~\Anaconda3\lib\site-packages\numpy\lib\nanfunctions.py in nanmin(a, axis, out, keepdims)
    317         # Fast, but not safe for subclasses of ndarray, or object arrays,
    318         # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
--> 319         res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
    320         if np.isnan(res).any():
    321             warnings.warn("All-NaN slice encountered", RuntimeWarning,

ValueError: zero-size array to reduction operation fmin which has no identity

标签: pythonpandasdataframeseabornnan

解决方案


I think problem is passed object DataFrame to pd.DataFrame constructor, so there are different original columns names and new columns names from list, so only NaNs are created.

Solution is convert it to numpy array:

df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])

Or set new columns names in next step without DataFrame constructor:

df = full_Data.iloc[1:,4:10]
df.columns = ['A','B','C', 'D', 'E', 'F']

Solution create dict by existing columns only:

old = df.columns
new = ['A','B','C', 'D', 'E', 'F']

df = df.rename(columns=dict(zip(old, new)))
print (df)
      A  B  C    D  E     F
1  2178  0  0   66  0  1207
2  1042  0  0   60  0   921
3  2096  0  0  112  0  1744
4  1832  0  0  109  0  1718
5  1341  0  0   38  0   889
6  1933  0  0  123  0  1501

print (df.corr())
          A   B   C         D   E         F
A  1.000000 NaN NaN  0.606808 NaN  0.727034
B       NaN NaN NaN       NaN NaN       NaN
C       NaN NaN NaN       NaN NaN       NaN
D  0.606808 NaN NaN  1.000000 NaN  0.916325
E       NaN NaN NaN       NaN NaN       NaN
F  0.727034 NaN NaN  0.916325 NaN  1.000000

EDIT:

Problem was columns was not numeric.

df = df.astype(int)

Or:

df = df.apply(pd.to_numeric, errors='coerce')

推荐阅读