首页 > 解决方案 > Groupby 熊猫和体重时间序列

问题描述

我正在尝试通过“plant_name”和“month”的 groupby 计算每个时间序列或数据值的权重,但我看不到解决方案。我有看起来像这样的数据 -

   plant_name  month  adjusted_wspd
0   ARIZONA I      1           7.62
1   ARIZONA I      2           7.37
2   ARIZONA I      3           6.72
3   ARIZONA I      4           6.67
4   ARIZONA I      5           7.07
5   ARIZONA I      6           7.53
6   ARIZONA I      7           8.09
7   ARIZONA I      8           8.83
8   ARIZONA I      9           9.41
9   ARIZONA I     10           9.44
10  ARIZONA I     11           8.92
11  ARIZONA I     12           8.37
12  CAETITE I      1           7.10
13  CAETITE I      2           7.27
14  CAETITE I      3           7.26

我曾尝试编写一个函数并对定义的函数使用“应用”,但我得到了一个不正确的解决方案,看起来像这样,因为该函数可能不正确地进行数学运算:

ARIZONA  I  1        1.0
            2        1.0
            3        1.0
            4        1.0
            5        1.0
            6        1.0
            7        1.0
            8        1.0
            9        1.0
            10       1.0
            11       1.0
            12       1.0
CAETITE I   1        1.0
            2        1.0
            3        1.0

我的代码如下所示:

def my_agg(x):
    names = {'Weigths_WSPD': (x['adjusted_wspd'] )/x['adjusted_wspd'].sum()}
    return pd.Series(names, index=['Weights_WSPD'])
dfmm.groupby(["plant_name", "month"]).apply(my_agg)

我的答案应该是这样的百分比(%):

ARIZONA  I  1        7.93
            2        7.67
            3        7.00
            4        6.95
            5        7.36
            6        7.84
            7        8.42
            8        9.19
            9        9.80
            10       9.83
            11       9.29
            12       8.72
CAETITE I   1        33.82
            2        33.61
            3        33.56

谢谢你!

标签: pandasgroup-byapplyweighted

解决方案


让我们试试groupby transformsum on just plant_namenot plant_nameand month

df['Weigths_WSPD'] = (
        df['adjusted_wspd'] /
        df.groupby('plant_name')['adjusted_wspd'].transform('sum') *
        100
).round(2)
   plant_name  month  adjusted_wspd  Weigths_WSPD
0   ARIZONA I      1           7.62          7.93
1   ARIZONA I      2           7.37          7.67
2   ARIZONA I      3           6.72          7.00
3   ARIZONA I      4           6.67          6.95
4   ARIZONA I      5           7.07          7.36
5   ARIZONA I      6           7.53          7.84
6   ARIZONA I      7           8.09          8.42
7   ARIZONA I      8           8.83          9.19
8   ARIZONA I      9           9.41          9.80
9   ARIZONA I     10           9.44          9.83
10  ARIZONA I     11           8.92          9.29
11  ARIZONA I     12           8.37          8.72
12  CAETITE I      1           7.10         32.82
13  CAETITE I      2           7.27         33.61
14  CAETITE I      3           7.26         33.56

如果Series需要:

s = df.set_index(['plant_name', 'month'])
s = (
        s['adjusted_wspd'] /
        s.groupby(level=0)['adjusted_wspd'].transform('sum') *
        100
).round(2).rename('Weigths_WSPD')
plant_name  month
ARIZONA I   1         7.93
            2         7.67
            3         7.00
            4         6.95
            5         7.36
            6         7.84
            7         8.42
            8         9.19
            9         9.80
            10        9.83
            11        9.29
            12        8.72
CAETITE I   1        32.82
            2        33.61
            3        33.56
Name: Weigths_WSPD, dtype: float64

数据帧构造函数:

df = pd.DataFrame({
    'plant_name': {0: 'ARIZONA I', 1: 'ARIZONA I', 2: 'ARIZONA I',
                   3: 'ARIZONA I', 4: 'ARIZONA I', 5: 'ARIZONA I',
                   6: 'ARIZONA I', 7: 'ARIZONA I', 8: 'ARIZONA I',
                   9: 'ARIZONA I', 10: 'ARIZONA I', 11: 'ARIZONA I',
                   12: 'CAETITE I', 13: 'CAETITE I', 14: 'CAETITE I'},
    'month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
              10: 11, 11: 12, 12: 1, 13: 2, 14: 3},
    'adjusted_wspd': {0: 7.62, 1: 7.37, 2: 6.72, 3: 6.67, 4: 7.07, 5: 7.53,
                      6: 8.09, 7: 8.83, 8: 9.41, 9: 9.44, 10: 8.92, 11: 8.37,
                      12: 7.1, 13: 7.27, 14: 7.26}
})

推荐阅读