首页 > 解决方案 > Pandas dropna 抛出 ValueError:“无法将非有限值(NA 或 inf)转换为整数”

问题描述

熊猫:0.25.3
Python:3.7.4

我有一个数据框,我想删除仅包含 NaN 值的列。这应该很容易,因为有一个 Pandas DataFrame 函数可以做到这一点——<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html" rel="nofollow noreferrer">dropna。这是我的代码:

long_summary = long_summary.dropna(axis='columns', how='all') 

但是那条简单的行引发了异常:

ValueError:无法将非有限值(NA 或 inf)转换为整数

我看不出调用 dropna 会如何导致这个异常。发生了什么事,我该如何解决?

我将包括整个异常堆栈以防万一,以使问题更清楚:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-b4926abd4d81> in <module>
----> 1 long_summary = long_summary.dropna(axis='columns', how='all')

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\frame.py in dropna(self, axis, how, thresh, subset, inplace)
4860                 agg_obj = self.take(indices, axis=agg_axis)
4861 
-> 4862             count = agg_obj.count(axis=agg_axis)
4863 
4864             if thresh is not None:

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\frame.py in count(self, axis, level, numeric_only)
7848                 result = Series(counts, index=frame._get_agg_axis(axis))
7849 
-> 7850         return result.astype("int64")
7851 
7852     def _count_level(self, level, axis=0, numeric_only=False):

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
5880             # else, only a single dtype is given
5881             new_data = self._data.astype(
-> 5882                 dtype=dtype, copy=copy, errors=errors, **kwargs
5883             )
5884             return self._constructor(new_data).__finalize__(self)

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
    579 
    580     def astype(self, dtype, **kwargs):
--> 581         return self.apply("astype", dtype=dtype, **kwargs)
    582 
    583     def convert(self, **kwargs):

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    436                     kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
    437 
--> 438             applied = getattr(b, f)(**kwargs)
    439             result_blocks = _extend_blocks(applied, result_blocks)
    440 

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    557 
    558     def astype(self, dtype, copy=False, errors="raise", values=None, **kwargs):
--> 559         return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
    560 
    561     def _astype(self, dtype, copy=False, errors="raise", values=None, **kwargs):

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    641                     # _astype_nansafe works fine with 1-d only
    642                     vals1d = values.ravel()
--> 643                     values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
    644 
    645                 # TODO(extension)

c:\users\timregan\appdata\local\programs\python\python37\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    698         if not np.isfinite(arr).all():
    699             raise ValueError(
--> 700                 "Cannot convert non-finite values (NA or inf) to " "integer"
    701             )
    702 

ValueError: Cannot convert non-finite values (NA or inf) to integer

(注意我的列的数据类型是 int64、Int32 和 float64)

在评论中,斯科特要求提供数据以重现此问题。编辑后的 ​​CSV 可在 Dropbox 上找到

df = pd.read_csv('E:\\Temp\\dropna.csv')
df.dropna(axis='columns', how='all') 

但请注意,CSV 为 3.3 GB,生成的数据帧有超过 6000 万行。它尝试删除行,但似乎需要这么长才能触发错误。

标签: pandas

解决方案


推荐阅读