首页 > 解决方案 > 如何在熊猫中编写迭代公式?

问题描述

我处理数据框以生成迭代值。

例如:

import pandas as pd
import numpy as np

df = pd.DataFrame([[1,1970,np.nan,np.nan],[1,1971,np.nan,np.nan],[1,1972,np.nan,0.081],[1,1973,np.nan,0.222],[1,1974,np.nan,0],
[1,1975,np.nan,0],[1,1976,np.nan,0],[1,1977,np.nan,0],[2,1970,np.nan,np.nan],[2,1971,np.nan,np.nan],[2,1972,np.nan,0.081],[2,1973,np.nan,0.222],[2,1974,np.nan,0],
[2,1975,np.nan,0],[2,1976,np.nan,0],[2,1977,np.nan,0]],columns=['id','t','y','x']) 

迭代公式为:

y_t = (1 - 0.5) * y_{t-1} + x_t

其中y_0是组内的第一个非缺失 X 观测值(1 / 0.6)

y_0 = non missing value / 0.6.

对于第一组,第一个非缺失X值为0.081,因此y_0 = 0.081 / 0.6 = 0.135

我还有一个问题。如果原始数据框是不平衡面板。对于第 1 组,我们在数据框中没有 1973 年。对于缺失年份的观察,该年份的所有变量都缺失。

例如:

import pandas as pd
import numpy as np

df = pd.DataFrame([[1,1970,np.nan,np.nan],[1,1971,np.nan,np.nan],[1,1972,np.nan,0.081],[1,1974,np.nan,0],
[1,1975,np.nan,0],[1,1976,np.nan,0],[1,1977,np.nan,0],[2,1970,np.nan,np.nan],[2,1971,np.nan,np.nan],[2,1972,np.nan,0.081],[2,1973,np.nan,0.222],[2,1974,np.nan,0],
[2,1975,np.nan,0],[2,1976,np.nan,0],[2,1977,np.nan,0]],columns=['id','t','y','x']) 

所需的输出是:

id  t   y   x
1   1970    nan nan
1   1971    nan nan
1   1972    0.135   0.081
1   1974    nan 0
1   1975    nan 0
1   1976    nan 0
1   1977    nan 0
2   1970    nan nan
2   1971    nan nan
2   1972    0.135   0.081
2   1973    0.2895  0.222
2   1974    0.14475 0
2   1975    0.072375    0
2   1976    0.0361875   0
2   1977    0.01809375  0

我从 blutab 的函数中修改了 apply 函数,但它不起作用?

def rolling_apply(group):
    y = []
    first_index = group.index[0]
    idx=pd.date_range(start=group.index[0], end=group.last_valid_index(), freq='Y')
    group=group.reindex(idx)    
    first_valid_index = group.x.first_valid_index().year - first_index.year

    for index, x in enumerate(group.x):
        if index < first_valid_index:
            y.append(np.nan)
        elif index == first_valid_index:
            y.append( x/0.6)
        else:
            temp = (1-0.5)*y[-1] + x
            y.append(temp)
    group.y = y
    #group=group.reset_index()
    group=group[group['id'].notnull()]
    return group


df = pd.DataFrame([[1,1970,np.nan,np.nan],[1,1971,np.nan,np.nan],[1,1972,np.nan,0.081],[1,1974,np.nan,0],
[1,1975,np.nan,0],[1,1976,np.nan,0],[1,1977,np.nan,0],[2,1970,np.nan,np.nan],[2,1971,np.nan,np.nan],[2,1972,np.nan,0.081],[2,1973,np.nan,0.222],[2,1974,np.nan,0],
[2,1975,np.nan,0],[2,1976,np.nan,0],[2,1977,np.nan,0]],columns=['id','t','y','x']) 


df['year']=df['t']
df['month']=12
df['day']=31

df['date']=pd.to_datetime(df[['year','month','day']])
df=df.set_index(df['date'])
df['y'] = df.groupby(df.id).apply(rolling_apply).set_index(df['date']).y

非常感谢。

标签: pythonpython-3.xpandasnumpydataframe

解决方案


要进行滚动应用,您可以使用 pandas.groupby().apply()。在应用内,您可以使用循环来进行每组的计算

def rolling_apply(group):
    y = []
    first_index = group.index[0]
    first_valid_index = group.x.first_valid_index() - first_index

    for index, x in enumerate(group.x):
        if index < first_valid_index:
            y.append(np.nan)
        elif index == first_valid_index:
            y.append( x/0.6)
        else:
            temp = (1-0.5)*y[-1] + x
            y.append(temp)
    group.y = y
    return group

df['y'] = df.groupby(df.id).apply(rolling_apply).y

推荐阅读