首页 > 解决方案 > 使用 xarray DataArray 作为输入时,如何获取从 RegularGridInterpolator 返回的一维数组?

问题描述

我正在使用来自 netCDF4 文件的分块 xarray 对象作为 scipy 的 RegularGridInterpolator 的输入,如下所示。

cdf_data = xarray.open_dataset(filename, chunks={'time':100, 'x':10, 'y':10, 'z':10})  #import using dask for chunking
test_da = getattr(cdf_data, 'rr')  #sample DataArray for testing containing one variable
test_da = test_da.assign_coords(time=cdf_data._time, x=cdf_data._x, y=cdf_data._y, z=cdf_data._z)  #add arrays for coordinate values
rgi1 = RegularGridInterpolator((test_da.time, test_da.x, test_da.y, test_da.z), \
                              test_da, bounds_error = False, fill_value=np.NaN)   #define interpolator function

#The test_da object looks like this after the coordinate arrays are assigned:
#<xarray.DataArray 'rr' (time: 60, x: 656, y: 190, z: 190)>
#dask.array<open_dataset-52c52bdbb592b7919b52dece2a11a4farr, shape=(60, 656, 190, 190), dtype=float32, chunksize=(60, 10, 10, 10), #chunktype=numpy.ndarray>
#Coordinates:
#  * time     (time) float32 12.0 12.02 12.03 12.05 ... 12.93 12.95 12.97 #12.98
#  * x        (x) float32 -349.6 -348.1 -346.7 -345.2 ... 32.77 32.77 32.91 #33.05
#  * y        (y) float32 -95.97 -93.03 -90.04 -87.27 ... 87.27 90.04 93.03 #95.97
#  * z        (z) float32 -95.97 -93.03 -90.04 -87.27 ... 87.27 90.04 93.03 #95.97
#Attributes:
#    units:    1/cm**3

#given time, x, y, z are 1D arrays of the same length, the coordinate values are given by:
track = np.array([[t, c1_val, c2_val, c3_val] for t, c1_val, c2_val, c3_val in zip(
            time,x,y,z)])  #which has shape (59,4)
result1 = rgi1(track)  #calling rgi1 with track  gives the error:
#ValueError: cannot reshape array of size 12117361 into shape (59,)
#Note: 59**4 = 12117361   
#It is attempting to return an NxNxNxN array instead of the normal 1D array of length N

#creating the interpolator using numpy arrays does not produce this error
rgi2 = RegularGridInterpolator((test_da.time.values, test_da.x.values, 
                                test_da.y.values, test_da.z.values), 
                              test_da.values, bounds_error = False, fill_value=np.NaN)
result2 = rgi2(track)
#This returns a 1D array of length N (59 in my case) with beautiful speed.

我正在处理“中等数据”,它适合我的磁盘但不适合内存,所以我需要避免将对象转换为 numpy 数组,这在我的计算机上大约需要 5GB。我必须这样做 15 次,所以我描述的第二个选项不实用。我目前的(恼人的)解决方案是:

result = np.squeeze(np.array([rgi1(track0) for track0 in track]))

这很笨拙,比简单的 result2 调用要长几秒钟。有一个更好的方法吗?为什么两种情况下的行为不同?我通过 xarray 使用 interpn 得到相同的行为。

标签: scipyinterpolationpython-xarray

解决方案


推荐阅读