首页 > 解决方案 > 根据时间序列中的先前值和后续值将值替换为 NaN

问题描述

我正在使用 python pandas 和一个具有多个时间序列的巨大数据帧,类似于以下由三个时间序列组成的数据帧:

df = pd.DataFrame({
'Year': [2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013],
'Week': [48, 49, 50, 51, 52, 1, 2, 3, 4, 5, 48, 49, 50, 51, 52, 1, 2, 3, 4, 5, 48, 49, 50, 51, 52, 1, 2, 3, 4, 5],
'Location': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
'Amount': [None, None, None, None, None, 46, None, None, None, 55, None, None, None, None, None,29, 24, 65, 34, 34, 34, 23, 87, 56, 89, 23, 45, 63, 87, 89]})
    Year  Week  Location  Amount
0   2012    48         1     NaN
1   2012    49         1     NaN
2   2012    50         1     NaN
3   2012    51         1     NaN
4   2012    52         1     NaN
5   2013     1         1    46.0
6   2013     2         1     NaN
7   2013     3         1     NaN
8   2013     4         1     NaN
9   2013     5         1    55.0
10  2012    48         2     NaN
11  2012    49         2     NaN
12  2012    50         2     NaN
13  2012    51         2     NaN
14  2012    52         2     NaN
15  2013     1         2    29.0
16  2013     2         2    24.0
17  2013     3         2    65.0
18  2013     4         2    34.0
19  2013     5         2    34.0
20  2012    48         3    34.0
21  2012    49         3    23.0
22  2012    50         3    87.0
23  2012    51         3    56.0
24  2012    52         3    89.0
25  2013     1         3    23.0
26  2013     2         3    45.0
27  2013     3         3    63.0
28  2013     4         3    87.0
29  2013     5         3    89.0

对于每个时间序列,如果前三周和后三周是 NaNs ,我想将 2013 年第 1 周的 Amount 更改为 NaN

结果应如下所示(金额现在为 2013 年第 1 周位置 1 的 NaN):

    Year  Week  Location  Amount
0   2012    48         1     NaN
1   2012    49         1     NaN
2   2012    50         1     NaN
3   2012    51         1     NaN
4   2012    52         1     NaN
5   2013     1         1     NaN
6   2013     2         1     NaN
7   2013     3         1     NaN
8   2013     4         1     NaN
9   2013     5         1    55.0
10  2012    48         2     NaN
11  2012    49         2     NaN
12  2012    50         2     NaN
13  2012    51         2     NaN
14  2012    52         2     NaN
15  2013     1         2    29.0
16  2013     2         2    24.0
17  2013     3         2    65.0
18  2013     4         2    34.0
19  2013     5         2    34.0
20  2012    48         3    34.0
21  2012    49         3    23.0
22  2012    50         3    87.0
23  2012    51         3    56.0
24  2012    52         3    89.0
25  2013     1         3    23.0
26  2013     2         3    45.0
27  2013     3         3    63.0
28  2013     4         3    87.0
29  2013     5         3    89.0

我试过的不起作用:

df.loc[((df['Year'] == 2012) & (df['Week'] == 50) & (df['Amount'] == None)) &
       ((df['Year'] == 2012) & (df['Week'] == 51) & (df['Amount'] == None)) &
       ((df['Year'] == 2012) & (df['Week'] == 52) & (df['Amount'] == None)) &
       ((df['Year'] == 2013) & (df['Week'] == 1) & (df['Amount'] >= 0)) &
       ((df['Year'] == 2013) & (df['Week'] == 2) & (df['Amount'] == None)) &
       ((df['Year'] == 2013) & (df['Week'] == 3) & (df['Amount'] == None)) &
       ((df['Year'] == 2013) & (df['Week'] == 4) & (df['Amount'] == None)), 'Amount'] = None

任何想法如何解决这个问题?

标签: pythonpandasreplacetime-series

解决方案


使用rolling.sum和创建一个蒙版Series.groupbySeries.notna 使用Series.mask

m = (df['Amount'].notna()
                 .groupby(df['Location'])
                 .rolling(7,center = True).sum().le(1)
                 .reset_index(level = 'Location',drop='Location'))
df['Amount'] = df['Amount'].mask(m & df['Year'].eq(2013) & df['Week'].eq(1))
print(df)

    Year  Week  Location  Amount
0   2012    48         1     NaN
1   2012    49         1     NaN
2   2012    50         1     NaN
3   2012    51         1     NaN
4   2012    52         1     NaN
5   2013     1         1     NaN
6   2013     2         1     NaN
7   2013     3         1     NaN
8   2013     4         1     NaN
9   2013     5         1    55.0
10  2012    48         2     NaN
11  2012    49         2     NaN
12  2012    50         2     NaN
13  2012    51         2     NaN
14  2012    52         2     NaN
15  2013     1         2     NaN
16  2013     2         2    24.0
17  2013     3         2    65.0
18  2013     4         2    34.0
19  2013     5         2    34.0
20  2012    48         3    34.0
21  2012    49         3    23.0
22  2012    50         3    87.0
23  2012    51         3    56.0
24  2012    52         3    89.0
25  2013     1         3     NaN
26  2013     2         3    45.0
27  2013     3         3    63.0
28  2013     4         3    87.0
29  2013     5         3    89.0

对于新数据框:

df.assign(Amount = df['Amount'].mask(m & df['Year'].eq(2013) & df['Week'].eq(1)))

推荐阅读