python - Python:Netcdf:是否有一种方法可以从一个变量中获得总体平均值,而另一个变量与唯一值重叠?
问题描述
我有一个 netcdf 文件,其中包含一个名为 tag(形状时间 lat lon)的 3D int32 变量和一个名为 p(形状时间 lat lon)的 3D float64 变量。两个变量的形状大小相同。tag 变量的整数值的起始值为 0,其结束值为未知数(它们是单调递增的)。不需要 0 值,因此我想开始对 p var 进行总体(时空)平均,其中标记值 = 1 到最大标记值 n。
示例(数组空间(时间、纬度、经度)):第一个整数标记值为 1。该值出现在 (0,45,45) 和 (1,45,46) 处。这些标签 = 1 数组空间的 p 值是 2 和 4。所以平均结果应该等于 3。下一个整数标签值是 2。这个值出现在说 (2,100,99)、(2,101,99) , 和 (3,101,98),这些数组空间中的 p 值等于 3、8 和 1。所以平均结果应该等于 4。最后一个整数值是 n。该值出现在 (360,200,100)、(361,200,100)、(361,201,100) 和 (361(202,100) 处,这些数组空间中的 p 值等于 1、1、5 和 9。所以平均结果应该等于4. 将这些写入文本文件时,应如下所示:
3
4
.
.
4
下面的 python 代码读取 netcdf 文件和变量:
import datetime as dt # Python standard library datetime module
import numpy as np
from netCDF4 import Dataset # http://code.google.com/p/netcdf4-python/
def ncdump(nc_fid, verb=True):
'''
ncdump outputs dimensions, variables and their attribute information.
The information is similar to that of NCAR's ncdump utility.
ncdump requires a valid instance of Dataset.
Parameters
----------
nc_fid : netCDF4.Dataset
A netCDF4 dateset object
verb : Boolean
whether or not nc_attrs, nc_dims, and nc_vars are printed
Returns
-------
nc_attrs : list
A Python list of the NetCDF file global attributes
nc_dims : list
A Python list of the NetCDF file dimensions
nc_vars : list
A Python list of the NetCDF file variables
'''
def print_ncattr(key):
"""
Prints the NetCDF file attributes for a given key
Parameters
----------
key : unicode
a valid netCDF4.Dataset.variables key
"""
try:
print "\t\ttype:", repr(nc_fid.variables[key].dtype)
for ncattr in nc_fid.variables[key].ncattrs():
print '\t\t%s:' % ncattr,\
repr(nc_fid.variables[key].getncattr(ncattr))
except KeyError:
print "\t\tWARNING: %s does not contain variable attributes" % key
# NetCDF global attributes
nc_attrs = nc_fid.ncattrs()
if verb:
print "NetCDF Global Attributes:"
for nc_attr in nc_attrs:
print '\t%s:' % nc_attr, repr(nc_fid.getncattr(nc_attr))
nc_dims = [dim for dim in nc_fid.dimensions] # list of nc dimensions
# Dimension shape information.
if verb:
print "NetCDF dimension information:"
for dim in nc_dims:
print "\tName:", dim
print "\t\tsize:", len(nc_fid.dimensions[dim])
print_ncattr(dim)
# Variable information.
nc_vars = [var for var in nc_fid.variables] # list of nc variables
if verb:
print "NetCDF variable information:"
for var in nc_vars:
if var not in nc_dims:
print '\tName:', var
print "\t\tdimensions:", nc_fid.variables[var].dimensions
print "\t\tsize:", nc_fid.variables[var].size
print_ncattr(var)
return nc_attrs, nc_dims, nc_vars
nc_f = './tag.nc' # Your filename
nc_fid = Dataset(nc_f, 'r') # Dataset is the class behavior to open the file
# and create an instance of the ncCDF4 class
nc_attrs, nc_dims, nc_vars = ncdump(nc_fid)
# Extract data from NetCDF file
lats = nc_fid.variables['lat'][:] # extract/copy the data
lons = nc_fid.variables['lon'][:]
time = nc_fid.variables['time'][:]
tag = nc_fid.variables['tag'][:] # shape is time, lat, lon as shown above
nc_p = '../p/p.nc' # Your filename
nc_fid = Dataset(nc_p, 'r') # Dataset is the class behavior to open the file
# and create an instance of the ncCDF4 class
nc_attrs, nc_dims, nc_vars = ncdump(nc_fid)
p = nc_fid.variables['p'][:] # shape is time, lat, lon as shown above
此代码返回:
NetCDF Global Attributes:
NetCDF dimension information:
Name: time
size: 365
type: dtype('float64')
axis: u'T'
calendar: u'standard'
standard_name: u'time'
units: u'hours since 1800-01-01 00:00'
Name: lat
size: 287
type: dtype('float64')
long_name: u'latitude'
units: u'degrees_north'
standard_name: u'latitude'
axis: u'Y'
Name: lon
size: 612
type: dtype('float64')
long_name: u'longitude'
units: u'degrees_east'
standard_name: u'longitude'
axis: u'X'
NetCDF variable information:
Name: tag
dimensions: (u'time', u'lat', u'lon')
size: 64110060
type: dtype('int32')
我一直在玩 pandas groupby 函数,但我还没有找到适合我的例子的东西。
解决方案
我找到了一个快速有效的解决方案。检查结果,它们是正确的。
使用 xarray 打开数据,然后我将数据转换为数据帧。之后我可以使用 pandas groupby 进行计算。
from pylab import *
import numpy as np
import pandas as pd
import xarray as xr
import netCDF4
# Open data with xarray
dt = xr.open_mfdataset(['../tag.nc', '../p/p.nc'], combine='by_coords')
# Convert to data frame
dtdf = dt.to_dataframe()
dm = {'p': ['mean']}
mean = dtdf.groupby('tag').agg(dm)
mean.columns = ['_'.join(col) for col in mean.columns.values]
p_mean = mean.loc[1:, 'p_mean']
推荐阅读
- python - 为什么python中字符串前的'r'如此重要?
- javascript - 减少 React js 包大小
- jsp - 防止用户修改 url 中的 id
- python - 尝试使用用户输入在 python 中创建一个简单的表单
- python - 只需填充 numpy 数组的特定条目 -> 生成稀疏矩阵
- zip - Android 10 (Android Q):如何因外部存储更改而解压缩文件(无 android:requestLegacyExternalStorage)
- javascript - $('#demo').croppie('destroy') ; 销毁croppie实例croppie.js失败
- python - psycopg2.OperationalError:致命:角色不存在
- python - MASK R:CNN 试图将“形状”转换为张量但失败了。错误:不支持无值
- javascript - 将对象的显示数组反应到行跨度中的表