python - 根据时间序列中的先前值和后续值将值替换为 NaN
问题描述
我正在使用 python pandas 和一个具有多个时间序列的巨大数据帧,类似于以下由三个时间序列组成的数据帧:
df = pd.DataFrame({
'Year': [2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013],
'Week': [48, 49, 50, 51, 52, 1, 2, 3, 4, 5, 48, 49, 50, 51, 52, 1, 2, 3, 4, 5, 48, 49, 50, 51, 52, 1, 2, 3, 4, 5],
'Location': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
'Amount': [None, None, None, None, None, 46, None, None, None, 55, None, None, None, None, None,29, 24, 65, 34, 34, 34, 23, 87, 56, 89, 23, 45, 63, 87, 89]})
Year Week Location Amount
0 2012 48 1 NaN
1 2012 49 1 NaN
2 2012 50 1 NaN
3 2012 51 1 NaN
4 2012 52 1 NaN
5 2013 1 1 46.0
6 2013 2 1 NaN
7 2013 3 1 NaN
8 2013 4 1 NaN
9 2013 5 1 55.0
10 2012 48 2 NaN
11 2012 49 2 NaN
12 2012 50 2 NaN
13 2012 51 2 NaN
14 2012 52 2 NaN
15 2013 1 2 29.0
16 2013 2 2 24.0
17 2013 3 2 65.0
18 2013 4 2 34.0
19 2013 5 2 34.0
20 2012 48 3 34.0
21 2012 49 3 23.0
22 2012 50 3 87.0
23 2012 51 3 56.0
24 2012 52 3 89.0
25 2013 1 3 23.0
26 2013 2 3 45.0
27 2013 3 3 63.0
28 2013 4 3 87.0
29 2013 5 3 89.0
对于每个时间序列,如果前三周和后三周是 NaNs ,我想将 2013 年第 1 周的 Amount 更改为 NaN。
结果应如下所示(金额现在为 2013 年第 1 周位置 1 的 NaN):
Year Week Location Amount
0 2012 48 1 NaN
1 2012 49 1 NaN
2 2012 50 1 NaN
3 2012 51 1 NaN
4 2012 52 1 NaN
5 2013 1 1 NaN
6 2013 2 1 NaN
7 2013 3 1 NaN
8 2013 4 1 NaN
9 2013 5 1 55.0
10 2012 48 2 NaN
11 2012 49 2 NaN
12 2012 50 2 NaN
13 2012 51 2 NaN
14 2012 52 2 NaN
15 2013 1 2 29.0
16 2013 2 2 24.0
17 2013 3 2 65.0
18 2013 4 2 34.0
19 2013 5 2 34.0
20 2012 48 3 34.0
21 2012 49 3 23.0
22 2012 50 3 87.0
23 2012 51 3 56.0
24 2012 52 3 89.0
25 2013 1 3 23.0
26 2013 2 3 45.0
27 2013 3 3 63.0
28 2013 4 3 87.0
29 2013 5 3 89.0
我试过的不起作用:
df.loc[((df['Year'] == 2012) & (df['Week'] == 50) & (df['Amount'] == None)) &
((df['Year'] == 2012) & (df['Week'] == 51) & (df['Amount'] == None)) &
((df['Year'] == 2012) & (df['Week'] == 52) & (df['Amount'] == None)) &
((df['Year'] == 2013) & (df['Week'] == 1) & (df['Amount'] >= 0)) &
((df['Year'] == 2013) & (df['Week'] == 2) & (df['Amount'] == None)) &
((df['Year'] == 2013) & (df['Week'] == 3) & (df['Amount'] == None)) &
((df['Year'] == 2013) & (df['Week'] == 4) & (df['Amount'] == None)), 'Amount'] = None
任何想法如何解决这个问题?
解决方案
使用rolling.sum
和创建一个蒙版Series.groupby
并Series.notna
使用Series.mask
:
m = (df['Amount'].notna()
.groupby(df['Location'])
.rolling(7,center = True).sum().le(1)
.reset_index(level = 'Location',drop='Location'))
df['Amount'] = df['Amount'].mask(m & df['Year'].eq(2013) & df['Week'].eq(1))
print(df)
Year Week Location Amount
0 2012 48 1 NaN
1 2012 49 1 NaN
2 2012 50 1 NaN
3 2012 51 1 NaN
4 2012 52 1 NaN
5 2013 1 1 NaN
6 2013 2 1 NaN
7 2013 3 1 NaN
8 2013 4 1 NaN
9 2013 5 1 55.0
10 2012 48 2 NaN
11 2012 49 2 NaN
12 2012 50 2 NaN
13 2012 51 2 NaN
14 2012 52 2 NaN
15 2013 1 2 NaN
16 2013 2 2 24.0
17 2013 3 2 65.0
18 2013 4 2 34.0
19 2013 5 2 34.0
20 2012 48 3 34.0
21 2012 49 3 23.0
22 2012 50 3 87.0
23 2012 51 3 56.0
24 2012 52 3 89.0
25 2013 1 3 NaN
26 2013 2 3 45.0
27 2013 3 3 63.0
28 2013 4 3 87.0
29 2013 5 3 89.0
对于新数据框:
df.assign(Amount = df['Amount'].mask(m & df['Year'].eq(2013) & df['Week'].eq(1)))
推荐阅读
- python - Read the page number while scraping a website using BeautifulSoup
- java - 如何重写我的两个方法,这样我就不会得到 NullPointerException?
- php - Add a text next to stock quantity if it is less than 10 in Woocommerce single product
- android - Gradle 必须依赖 com.google.firebase:firebase-core 才能使 Firebase 服务按预期工作
- angular5 - 在教程示例中无法解析“rxjs/Rx”
- python - Pandas 数据框的切片列在从该列创建的新对象中不断提及原始列名
- angular - 使用 Angular 4 刷新会重复当前路线
- java - 在 Android 中使用 AsynkTaskLoader 从 Cloud Firestore 检索数据
- python - pandas - 在多列上分组后的最高计数项目
- python - python中的对象如何找到它的可用方法?