首页 > 解决方案 > 从包含在 shapefile 边界内的 netcdf 文件中提取数据

问题描述

我有以下shapefilenetcdf 文件

我想从包含在 shapefile 边界内的 netcdf 文件中提取数据。

您对我如何实现这一目标有什么建议吗?

shapefile 对应于 SREX 区域 11 North Europe (NEU),netcdf 文件是 CMIP6 气候模型数据输出的示例(ua 变量)。我想要的输出必须是 netcdf 格式。


更新

到目前为止,我尝试使用 NCL 和 CDO 创建一个 netcdf 掩码,并将此掩码应用于原始 netcdf 数据集。下面是步骤(和NCL 脚本):

#################
## remove plev dimension from netcdf file
cdo --reduce_dim -copy nc_file.nc nc_file2.nc

## convert longitude to -180, 180
cdo sellonlatbox,-180,180,-90,90 nc_file2.nc nc_file3.nc

## create mask 
ncl create_nc_mask.ncl

## apply mask
cdo div nc_file3.nc shape1_mask.nc nc_file4.nc 
#################

输出几乎是正确的。见下图。但是 shapefile 的南部边界(SREX 11,NEU)没有被正确捕获。所以我想生成 netcdf 掩码的 NCL 脚本有问题。

在此处输入图像描述

标签: pythonmasknetcdfcdo-climatencl

解决方案


重新使用一些旧的脚本/代码,我很快想出了一个 Python 解决方案。它基本上只是遍历所有网格点,并检查每个网格点是在形状文件中的多边形内部还是外部。结果是变量mask(带有 的数组True/False),可用于屏蔽您的 NetCDF 变量。

注意:这使用Numba(所有@jit行)来加速代码,尽管在这种情况下这并不是必需的。如果您没有 Numba,您可以将它们注释掉。

import matplotlib.pyplot as pl
import netCDF4 as nc4
import numpy as np
import fiona
from numba import jit

@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate distance from (x1,y1) to (x2,y2)
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point (x,y) is on line (x1,y1) to (x2,y2)
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if wn == 0:
        return False
    else:
        return True

@jit(nopython=True, nogil=True)
def calc_mask(mask, lon, lat, shp_lon, shp_lat):
    """
    Calculate mask of grid points which are inside `shp_lon, shp_lat`
    """

    for j in range(lat.size):    
        for i in range(lon.size):
            if is_inside(lon[i], lat[j], shp_lon, shp_lat, shp_lon.size):
                mask[j,i] = True


if __name__ == '__main__':

    # Selection of time and level:
    time = 0
    plev = 0

    # Read NetCDF variables, shifting the longitudes
    # from 0-360 to -180,180, like the shape file:
    nc = nc4.Dataset('nc_file.nc')
    nc_lon = nc.variables['lon'][:]-180.
    nc_lat = nc.variables['lat'][:]
    nc_ua  = nc.variables['ua'][time,plev,:,:]

    # Read shapefile and first feature
    fc = fiona.open("shape1.shp")
    feature = next(iter(fc))

    # Extract array of lat/lon coordinates:
    coords = feature['geometry']['coordinates'][0]
    shp_lon = np.array(coords)[:,0]
    shp_lat = np.array(coords)[:,1]

    # Calculate mask
    mask = np.zeros_like(nc_ua, dtype=bool)
    calc_mask(mask, nc_lon, nc_lat, shp_lon, shp_lat)

    # Mask the data array
    nc_ua_masked = np.ma.masked_where(~mask, nc_ua)

    # Plot!
    pl.figure(figsize=(8,4))
    pl.subplot(121)
    pl.pcolormesh(nc_lon, nc_lat, nc_ua, vmin=-40, vmax=105)
    pl.xlim(-20, 50)
    pl.ylim(40, 80)

    pl.subplot(122)
    pl.pcolormesh(nc_lon, nc_lat, nc_ua_masked, vmin=-40, vmax=105)
    pl.xlim(-20, 50)
    pl.ylim(40, 80)

    pl.tight_layout()

在此处输入图像描述

编辑

要将掩码写入 NetCDF,可以使用以下内容:

nc_out = nc4.Dataset('mask.nc', 'w')
nc_out.createDimension('lon', nc_lon.size)
nc_out.createDimension('lat', nc_lat.size)

nc_mask_out = nc_out.createVariable('mask', 'i2', ('lat','lon'))
nc_lon_out = nc_out.createVariable('lon', 'f8', ('lon'))
nc_lat_out = nc_out.createVariable('lat', 'f8', ('lat'))

nc_mask_out[:,:] = mask[:,:]  # Or ~mask to reverse it
nc_lon_out[:] = nc_lon[:]     # With +180 if needed
nc_lat_out[:] = nc_lat[:]

nc_out.close()

推荐阅读