我已经从 EURO-CORDEX 集合下载了几个用于每日降水通量的气候模型。虽然有些型号使用标准日历,与 Pandas 兼容,但datetime其他型号,特别是 MOHC HadGem2 ES,使用 360-day CFTimeIndex


降水通量数据(2011-2015 节选)可能如下所示 您可以在此处下载。

Dimensions:       (bnds: 2, rlat: 412, rlon: 424, time: 1800)
    lat           (rlat, rlon) float64 ...
    lon           (rlat, rlon) float64 ...
  * rlat          (rlat) float64 -23.38 -23.26 -23.16 ... 21.61 21.73 21.83
  * rlon          (rlon) float64 -28.38 -28.26 -28.16 ... 17.93 18.05 18.16
  * time          (time) object 2011-01-01 12:00:00 ... 2015-12-30 12:00:00
Dimensions without coordinates: bnds
Data variables:
    pr            (time, rlat, rlon) float32 ...
    rotated_pole  |S1 ...
    time_bnds     (time, bnds) object ...
如您所见,数据集的时间维度为cftime.Datetime360Day. 所有月份都是 30 天,这有时对气候预测有利,pandas但对气候预测不利。

<xarray.DataArray 'time' (time: 1800)>
array([cftime.Datetime360Day(2011-01-01 12:00:00),
       cftime.Datetime360Day(2011-01-02 12:00:00),
       cftime.Datetime360Day(2011-01-03 12:00:00), ...,
       cftime.Datetime360Day(2015-12-28 12:00:00),
       cftime.Datetime360Day(2015-12-29 12:00:00),
       cftime.Datetime360Day(2015-12-30 12:00:00)], dtype=object)
  * time     (time) object 2011-01-01 12:00:00 ... 2015-12-30 12:00:00
    standard_name:  time
    long_name:      time
    bounds:         time_bnds


我通过将 CFTimeIndex 转换为字符串,将pandas.DataFrame时间转换为pd.to_datetimeerrors=coerce

ds = xarray.open_dataset('data/mohc_hadgem2_es.nc')

def cft_to_string(cfttime_obj):
        month = str(cfttime_obj.month)
        day = str(cfttime_obj.day)

        # This is awful but there were no two-digit months/days by default
        month = '0'+month if len(month)==1 else month
        day = '0'+day if len(day)==1 else day

        return f'{cfttime_obj.year}-{month}-{day}'

# Apply above function
ds_time_strings = list(map(cft_to_string, ds['time']))

# Get precipitation values only (to use in pandas dataframe)
# Suppose the data are from multiple pixels (for whole of Europe)
# - that's why the mean(axis=(1,2))

precipitation = ds['pr'].values.mean(axis=(1,2))

# To dataframe
df = pd.DataFrame(index=ds_time_strings, data={'precipitation': precipitation})

# Coerce erroneous dates
df.index = pd.to_datetime(df.index, errors='coerce') # Now, dates such as 2011-02-30 are omitted

这给出了一个带有非标准日期的数据框作为 NaT 并且缺少一些日期(第 31 天)。我不介意,因为我创建了 90 年跨度的预测。

2011-01-01  0.000049
2011-01-02  0.000042
2011-01-03  0.000031
2011-01-04  0.000030
2011-01-05  0.000038
... ...
2011-02-28  0.000041
NaT         0.000055
NaT         0.000046
2011-03-01  0.000031
... ...
2015-12-26  0.000028
2015-12-27  0.000034
2015-12-28  0.000028
2015-12-29  0.000025
2015-12-30  0.000024
1800 rows × 1 columns


虽然这似乎可行,但只有 xarray/pandas 有更清洁的方法吗?可能不是基于字符串的?


感谢您提供详细的示例!如果您的分析可以接受每月均值的时间序列,我认为最干净的方法是重新采样到“月开始”频率,然后协调日期类型,例如对于由 a 索引的数据集CFTimeIndex,例如:

resampled = ds.resample(time="MS").mean()
resampled["time"] = resampled.indexes["time"].to_datetimeindex()

这基本上是您的第二个要点,但有细微的变化。重新采样到月开始频率可以解决 360 天日历包含标准日历中不存在的月末的问题,例如 2 月 30 日。
