首页 > 解决方案 > 输入类型不支持 ufunc 'isfinite',并且无法安全地将输入强制转换为支持的类型 - 想要降级整数列

问题描述

我正在尝试通过将 dtype 'int64' 的列向下转换为 'unsigned' 来优化我的数据框,但实际上具有负值的一列除外。对于那一列,我只想将它从“int64”单独转换为“int8”。但是,当我使用以下代码完成此操作时,出现错误:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

我正在使用的代码是:

def mem_usage(pandas_obj):
    if isinstance(pandas_obj,pd.DataFrame):
        usage_b = pandas_obj.memory_usage(deep=True).sum()
    else: # we assume if not a df it's a series
        usage_b = pandas_obj.memory_usage(deep=True)
    usage_mb = usage_b / 1024 ** 2 # convert bytes to megabytes
    return "{:03.2f} MB".format(usage_mb)
dtypes = data_pitch_2017_1.drop('launch_angle',axis=1).dtypes
data_pitch_2017_1_int = data_pitch_2017_1.select_dtypes(include=['int'])
converted_int = data_pitch_2017_1_int.apply(pd.to_numeric,downcast='unsigned', errors ='ignore')
print(mem_usage(data_pitch_2017_1_int))
print(mem_usage(converted_int))
compare_ints = pd.concat([data_pitch_2017_1_int.dtypes,converted_int.dtypes],axis=1)
compare_ints.columns = ['before','after']
compare_ints.apply(pd.Series.value_counts)

我收到的完整错误代码是:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-8486ea5172f1> in <module>
      8 dtypes = data_pitch_2017_1.drop('launch_angle',axis=1).dtypes
      9 data_pitch_2017_1_int = data_pitch_2017_1.select_dtypes(include=['int'])
---> 10 converted_int = data_pitch_2017_1_int.apply(pd.to_numeric,downcast='unsigned', errors ='ignore')
     11 print(mem_usage(data_pitch_2017_1_int))
     12 print(mem_usage(converted_int))

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in f(x)
    108 
    109             def f(x):
--> 110                 return func(x, *args, **kwds)
    111 
    112         else:

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    182             for dtype in typecodes:
    183                 if np.dtype(dtype).itemsize <= values.dtype.itemsize:
--> 184                     values = maybe_downcast_to_dtype(values, dtype)
    185 
    186                     # successful conversion

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in maybe_downcast_to_dtype(result, dtype)
    203             return PeriodArray(result, freq=dtype.freq)
    204 
--> 205     converted = maybe_downcast_numeric(result, dtype, do_round)
    206     if converted is not result:
    207         return converted

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in maybe_downcast_numeric(result, dtype, do_round)
    284                     return new_result
    285             else:
--> 286                 if np.allclose(new_result, result, rtol=0):
    287                     return new_result
    288 

<__array_function__ internals> in allclose(*args, **kwargs)

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py in allclose(a, b, rtol, atol, equal_nan)
   2187 
   2188     """
-> 2189     res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
   2190     return bool(res)
   2191 

<__array_function__ internals> in isclose(*args, **kwargs)

/Volumes/DATASTORE_L/opt/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py in isclose(a, b, rtol, atol, equal_nan)
   2286 
   2287     xfin = isfinite(x)
-> 2288     yfin = isfinite(y)
   2289     if all(xfin) and all(yfin):
   2290         return within_tol(x, y, atol, rtol)

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

我确实尝试包含代码errors ='ignore'以忽略可能导致错误的任何 Nan 值或非数字值。

但我仍然没有成功。

数据框信息为:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 721244 entries, 0 to 721243
Data columns (total 85 columns):
 #   Column                           Non-Null Count   Dtype         
---  ------                           --------------   -----         
 0   pitcher                          721244 non-null  int64         
 1   key_retro                        721244 non-null  object        
 2   pitch_type                       718006 non-null  object        
 3   Game_Date                        721244 non-null  datetime64[ns]
 4   release_speed                    717883 non-null  Float64       
 5   release_pos_x                    717862 non-null  Float64       
 6   release_pos_z                    717862 non-null  Float64       
 7   player_name                      721244 non-null  object        
 8   batter                           721244 non-null  Int64         
 9   events                           184565 non-null  object        
 10  description                      721244 non-null  object        
 11  zone                             717862 non-null  Int64         
 12  des                              721244 non-null  object        
 13  game_type                        721244 non-null  object        
 14  stand                            721244 non-null  object        
 15  p_throws                         721244 non-null  object        
 16  home_team                        721244 non-null  object        
 17  away_team                        721244 non-null  object        
 18  type                             721244 non-null  object        
 19  hit_location                     161307 non-null  Int64         
 20  bb_type                          127555 non-null  object        
 21  balls                            721244 non-null  Int64         
 22  strikes                          721244 non-null  Int64         
 23  game_year                        721244 non-null  Int64         
 24  pfx_x                            717862 non-null  Float64       
 25  pfx_z                            717862 non-null  Float64       
 26  plate_x                          717862 non-null  Float64       
 27  plate_z                          717862 non-null  Float64       
 28  on_3b                            67052 non-null   Int64         
 29  on_2b                            132754 non-null  Int64         
 30  on_1b                            220388 non-null  Int64         
 31  outs_when_up                     721244 non-null  Int64         
 32  inning                           721244 non-null  Int64         
 33  inning_topbot                    721244 non-null  object        
 34  hc_x                             124661 non-null  Float64       
 35  hc_y                             124661 non-null  Float64       
 36  fielder_2                        719473 non-null  Int64         
 37  vx0                              717862 non-null  Float64       
 38  vy0                              717862 non-null  Float64       
 39  vz0                              717862 non-null  Float64       
 40  ax                               717862 non-null  Float64       
 41  ay                               717862 non-null  Float64       
 42  az                               717862 non-null  Float64       
 43  sz_top                           717862 non-null  Float64       
 44  sz_bot                           717862 non-null  Float64       
 45  hit_distance_sc                  188244 non-null  Int64         
 46  launch_speed                     199437 non-null  Float64       
 47  launch_angle                     199441 non-null  Int64         
 48  effective_speed                  716330 non-null  Float64       
 49  release_spin_rate                703603 non-null  Int64         
 50  release_extension                717862 non-null  Float64       
 51  game_pk                          721244 non-null  Int64         
 52  pitcher.1                        721244 non-null  Int64         
 53  fielder_2.1                      719473 non-null  Int64         
 54  fielder_3                        719473 non-null  Int64         
 55  fielder_4                        719473 non-null  Int64         
 56  fielder_5                        719473 non-null  Int64         
 57  fielder_6                        719473 non-null  Int64         
 58  fielder_7                        719473 non-null  Int64         
 59  fielder_8                        719473 non-null  Int64         
 60  fielder_9                        719473 non-null  Int64         
 61  release_pos_y                    717862 non-null  Float64       
 62  estimated_ba_using_speedangle    125403 non-null  Float64       
 63  estimated_woba_using_speedangle  125403 non-null  Float64       
 64  woba_value                       184565 non-null  Float64       
 65  woba_denom                       184565 non-null  Int64         
 66  babip_value                      184565 non-null  Int64         
 67  iso_value                        184565 non-null  Int64         
 68  launch_speed_angle               125403 non-null  Int64         
 69  at_bat_number                    721244 non-null  Int64         
 70  pitch_number                     721244 non-null  Int64         
 71  pitch_name                       718006 non-null  object        
 72  home_score                       721244 non-null  Int64         
 73  away_score                       721244 non-null  Int64         
 74  bat_score                        721244 non-null  Int64         
 75  fld_score                        721244 non-null  Int64         
 76  post_away_score                  721244 non-null  Int64         
 77  post_home_score                  721244 non-null  Int64         
 78  post_bat_score                   721244 non-null  Int64         
 79  post_fld_score                   721244 non-null  Int64         
 80  if_fielding_alignment            717029 non-null  object        
 81  of_fielding_alignment            717029 non-null  object        
 82  spin_axis                        717862 non-null  Int64         
 83  delta_home_win_exp               721243 non-null  Float64       
 84  delta_run_exp                    721121 non-null  Float64       
dtypes: Float64(26), Int64(40), datetime64[ns](1), int64(1), object(17)
memory usage: 1.1 GB

有人可以帮助解决问题以及如何在代码中解决问题吗?

谢谢你。

标签: pythonpandasdataframeoptimizationinteger

解决方案


推荐阅读