首页 > 解决方案 > 每天累积x英寸需要多少天?

问题描述

这是我的数据框的快照。它又持续了 60 多年。我唯一做的就是将我的索引设置为DATE列。

            PRCP
DATE            
1950-01-01  0.00
1950-01-02  0.00
1950-01-03  0.08
1950-01-04  0.00
1950-01-05  0.00
1950-01-06  0.00
1950-01-07  0.21
1950-01-08  0.00
1950-01-09  0.00
1950-01-10  0.55
1950-01-11  0.00
1950-01-12  0.00
1950-01-13  0.15
1950-01-14  0.00
1950-01-15  0.00
1950-01-16  0.00
1950-01-17  0.00
1950-01-18  0.20
1950-01-19  0.00

我想做的是累积PRCP列,直到它达到大于或等于1.0的值。一旦达到这一点,我希望它在下一个日期做同样的事情。

例如,它看起来像这样,其中一列中的日期和第二列中达到 1.0 所需的天数。我在下面使用的数字并不准确(除了第一个日期),但模式将沿着这些路线。:

            Days to Reach 1.0
DATE
1950-01-01  18
1950-01-02  6
1950-01-03  2
1950-01-04  20
1950-01-05  5
1950-01-06  1
1950-01-07  14

一旦我有了这个,我会做一个简单的......

groupby(df.index.dayofyear).mean()

所以最终的产品将是......

DayOfYear   Days to Reach 1.0
01          9
02          20
03          12
04          14
...
365         14
366         12

标签: pythonpandas

解决方案


对于任何好奇的人,我尝试了一种不同的、更基本的方法。它可能效率不高,但在这里。

    depth_list = [1.0,4.0,10.0,20.0]        # various threshold depths to reach.
    df_j = pd.DataFrame(df['DATE'])         # creating an empty dataframe with DATE as the index
    for d in depth_list:
        j_list = []                         # starting an empty list for each threshold depth
        count1 = 0                          # adding in a 365 day counter assuming it does not take more than a year to accumulate.
        count2 = 366
        for i in df['DATE']:                # looping through i values after looping through Precipitation values from count1 to count2
            j_sum = 0                       # reset sums
            j_count = 0
            for j in df['PRCP'][count1:count2]:  
                j_count = j_count + 1
                j_sum = j_sum + j
                if j_sum >= d:              # Checking if value is greater than threshold.
                    j_list.append(j_count)
                    break                   # break out of loop when value is reached.
            count1 += 1
            count2 += 1
        
        df_j[d] = pd.DataFrame(j_list)          # Putting list into a dataframe with dates as the index knowing they are the same length
    df_join = df_j.set_index('DATE')    
    df_daymean = df_join.groupby(df_join.index.dayofyear).mean()        # Grouping by day of the year and taking the mean.
    print(df_daymean)

对于看起来像这样的数据框...

           1.0        4.0        10.0        20.0
DATE                                             
1     11.928571  39.385714  91.371429  168.985714
2     11.971429  39.400000  91.185714  168.800000
3     11.728571  39.314286  91.100000  168.300000
4     11.871429  38.771429  90.528571  168.442857
5     11.900000  39.014286  90.528571  168.371429
        ...        ...        ...         ...
362   11.485714  40.115942  90.956522  170.159420
363   11.314286  40.101449  91.217391  169.927536
364   12.257143  40.318841  91.956522  169.637681
365   12.681159  39.913043  92.202899  169.260870
366   14.941176  44.176471  97.352941  174.529412

推荐阅读