首页 > 解决方案 > 如何从具有日期时间索引的数据帧中减去具有月份索引的数据帧

问题描述

我有两个数据框,一个被调用Clim,一个被调用O3_mda8_3135Clim是一个数据框,包括一年数据的月平均气象参数;这是数据框的示例:

Clim.head(12)
Out[7]: 
       avgT_2551  avgT_5330  ...  avgNOx_3135(ppb)  avgCO_3135(ppm)
Month                        ...                                   
1      14.924181  13.545691  ...         48.216128         0.778939
2      16.352172  15.415385  ...         36.110385         0.605629
3      20.530879  19.684720  ...         20.974544         0.460571
4      23.738576  22.919158  ...         14.270995         0.432855
5      26.961927  25.779007  ...         11.087005         0.334505
6      32.208322  31.225072  ...         12.801409         0.384325
7      35.280124  34.265880  ...         10.732970         0.321284
8      35.428857  34.433351  ...         11.916420         0.326389
9      32.008317  30.856782  ...         15.236616         0.343405
10     25.691444  24.139874  ...         24.829518         0.467317
11     19.310550  17.827946  ...         36.339847         0.621938
12     14.186050  12.860077  ...         49.173287         0.720708

[12 rows x 20 columns]

我也有 dataframe O3_mda8_3135,它是通过首先计算每个组件的滚动 8 小时平均值,然后找到臭氧的最大每日值创建的,这就是所有时间戳和索引都不同的原因。一年中的每一天,每个气象参数都有一个值。这是此数据框的示例:

 O3_mda8_3135
Out[9]: 
                    date  Temp_C_2551  ...  CO_3135(ppm)  O3_mda8_3135
12   2018-01-01 12:00:00         24.1  ...         0.294       10.4000
36   2018-01-02 12:00:00         26.3  ...         0.202        9.4375
60   2018-01-03 12:00:00         22.8  ...         0.184        7.1625
84   2018-01-04 12:00:00         25.6  ...         0.078        8.2500
109  2018-01-05 13:00:00         27.3  ...           NaN        9.4500
                 ...          ...  ...           ...           ...
8653 2018-12-27 13:00:00         19.6  ...         0.115       35.1125
8676 2018-12-28 12:00:00         14.9  ...         0.097       39.4500
8700 2018-12-29 12:00:00         13.9  ...         0.092       38.1250
8724 2018-12-30 12:00:00         17.4  ...         0.186       35.1375
8753 2018-12-31 17:00:00          8.3  ...         0.110       30.8875

[365 rows x 24 columns]

I am wondering how to subtract the average values in Clim from the corresponding columns and rows in O3_mda8_3135. For example, I would like to subtract the average value for temperature at site 2551 in January (avgT_2551 Month 1 in the Clim dataframe) from every day in January in the other dataframe O3_mda8_3135, column name Temp_C_2551.

avgT_2551 corresponds to Temp_C_2551 in the other dataframe

Is there a simple way to do this? Should I extract the month from the datetime and put it into another column for the O3_mda8_3135 dataframe? I am still a beginner and would appreciate any advice or tips.

I saw this post How to subtract the mean of a month from each day in that month? but there was not enough information given for me to understand what actions were being performed.

标签: pythonpandasdataframedatetimesubtraction

解决方案


I figured it out on my own, thanks to Stack Overflow posts :)

I created new columns in both dataframes corresponding to the month. I had originally set the index in Clim to the Month using Clim = Clim.set_index('Month') so I removed that line. Then, I created a column for Month in the O3_mda8_3135 dataframe. After that, I merged the two dataframes based on the 'Month' column, then used the pd.sub function to subtract the columns I desired.

Here's some example code, sorry the variables are so long but this dataframe is huge.

O3_mda8_3135['Month'] = O3_mda8_3135['date'].dt.month
O3_mda8_3135_anom = pd.merge(O3_mda8_3135, Clim, how='left', on=('Month'))
O3_mda8_3135_anom['O3_mda8_3135_anom'] = O3_mda8_3135_anom['O3_mda8_3135'].sub(O3_mda8_3135_anom['MDA8_3135'])

These posts helped me answer my question:


推荐阅读