python - Pandas 数据框的滚动状态
问题描述
+------+--------+------------+------------+---+---+---+
| area | locale | date | end date | i | t | o |
+------+--------+------------+------------+---+---+---+
| abc | abc25 | 2001-03-01 | 2001-04-01 | 1 | | |
| abc | abc25 | 2001-04-01 | 2001-05-01 | 1 | | |
| abc | abc25 | 2001-05-01 | 2001-06-01 | 1 | | |
| abc | abc25 | 2001-06-01 | 2001-07-01 | | 1 | |
| abc | abc25 | 2001-07-01 | 2001-08-01 | | | 1 |
| abc | abc25 | 2001-08-01 | 2001-09-01 | | 1 | |
| abc | abc25 | 2001-09-01 | 2001-05-01 | | 1 | |
| abc | abc25 | 2001-10-01 | 2001-11-01 | | 1 | |
| abc | abc25 | 2001-11-01 | 2001-12-01 | | | 1 |
| abc | abc25 | 2001-12-01 | | | | 1 |
| def | def25 | 2001-03-01 | 2001-04-01 | | | 1 |
| def | def25 | 2001-04-01 | 2001-05-01 | | | 1 |
| def | def25 | 2001-05-01 | 2001-06-01 | | | 1 |
| def | def25 | 2001-06-01 | 2001-07-01 | | 1 | |
| def | def25 | 2001-07-01 | 2001-08-01 | | 1 | |
| def | def25 | 2001-08-01 | 2001-09-01 | 1 | | |
| def | def25 | 2001-09-01 | 2001-05-01 | 1 | | |
| def | def25 | 2001-10-01 | 2001-11-01 | | 1 | |
| def | def25 | 2001-11-01 | 2001-12-01 | | | 1 |
| def | def25 | 2001-12-01 | | | | 1 |
+------+--------+------------+------------+---+---+---+
这是我正在使用的数据表示例。我正在尝试做的是在此处添加状态列。状态栏虽然有点棘手,但标准如下:
- 如果任何 2 个时间段是相同的 i/t/o,那么它们将获得它们的关联状态(比如说 R/Y/G)
- 如果您有两种不同的状态,请选择“最佳”
示例输出:
+------+--------+------------+------------+---+---+---+--------+
| area | locale | date | end date | i | t | o | Status |
+------+--------+------------+------------+---+---+---+--------+
| abc | abc25 | 2001-03-01 | 2001-04-01 | 1 | | | NONE |
| abc | abc25 | 2001-04-01 | 2001-05-01 | 1 | | | R |
| abc | abc25 | 2001-05-01 | 2001-06-01 | 1 | | | R |
| abc | abc25 | 2001-06-01 | 2001-07-01 | | 1 | | Y |
| abc | abc25 | 2001-07-01 | 2001-08-01 | | | 1 | G |
| abc | abc25 | 2001-08-01 | 2001-09-01 | | 1 | | G |
| abc | abc25 | 2001-09-01 | 2001-05-01 | | 1 | | Y |
| abc | abc25 | 2001-10-01 | 2001-11-01 | | 1 | | Y |
| abc | abc25 | 2001-11-01 | 2001-12-01 | | | 1 | G |
| abc | abc25 | 2001-12-01 | | | | 1 | G |
| def | def25 | 2001-03-01 | 2001-04-01 | | | 1 | NONE |
| def | def25 | 2001-04-01 | 2001-05-01 | | | 1 | G |
| def | def25 | 2001-05-01 | 2001-06-01 | | | 1 | G |
| def | def25 | 2001-06-01 | 2001-07-01 | | 1 | | G |
| def | def25 | 2001-07-01 | 2001-08-01 | | 1 | | Y |
| def | def25 | 2001-08-01 | 2001-09-01 | 1 | | | Y |
| def | def25 | 2001-09-01 | 2001-05-01 | 1 | | | R |
| def | def25 | 2001-10-01 | 2001-11-01 | | 1 | | Y |
| def | def25 | 2001-11-01 | 2001-12-01 | | | 1 | G |
| def | def25 | 2001-12-01 | | | | 1 | G |
+------+--------+------------+------------+---+---+---+--------+
现在我查看了 pandas rolling,但这可能不是最好的方法;我尝试了以下方法:
df.groupby('locale')['o'].rolling(2).sum()
它自己工作,但我似乎无法从中创建一个列,所以我可以说如果 == 2 那么它是任何状态。我还尝试在if
声明中使用它:
if df.groupby('locale')['o'].rolling(2).sum() == 2.0 :
df['locale_status'] = 'Green'
这给出了一个关于系列真值的错误
我也试过:
if df.groupby('locale')['o'] == df.groupby('locale')['o'].shift() : df['test'] = 'Green'
这会导致无效的类型比较。
解决方案
我认为这个问题不适合矢量化/熊猫效率,但我很想被这里的一位忍者证明是错误的。我的解决方案涉及pd.read_clipboard()
您可能不需要的一些准备工作。
基本上我用 0 替换了空白,用于idxmax
获取“当前”字母,并发现是否有条纹。然后,我遍历行以在groupby
.
#data cleaning - from clipboard, prob irrelevant to OP
df=pd.read_clipboard(sep='|', engine='python', header=1)
df=df.reset_index().iloc[1:-1,1:-1]
df=df.rename(columns={ ' i ':'i',' t ':'t',' o ':'o',})
df=df.drop('Unnamed: 0',1)
df=df.replace(' ', 0)
df['current'] = df[['i','t','o']].astype(int).idxmax(1)
df['streak'] = df['current'] == df['current'].shift(1)
weights = {'i':0, 't':1, 'o':2}
results = []
for val in df[' area '].unique():
temp = df.loc[df.groupby(' area ').groups[val]].reset_index(drop=True)
winner = []
for idx, row in temp.iterrows():
if idx == 0:
winner.append(np.nan)
else:
current = row['current']
if row['streak']:
winner.append(current)
else:
last = temp.loc[idx-1, 'current']
if weights[last] > weights[current]:
winner.append(last)
else:
winner.append(current)
temp['winner'] = winner
results.append(temp)
res = pd.concat(results)
res['winner'] = res['winner'].map({'i':'R','t':'Y','o':'G'})
推荐阅读
- c - 将 2D 数组转换为 3D?
- sql - 从一列中选择所有值组合
- maven - 为 MariaDB 连接器 J 运行 mvn 测试时如何将 shell 变量或环境变量传递给 Maven
- r - 如何强制 data.table 将 i 中的名称解释为调用环境中的向量?
- c# - 创建倒计时时钟
- java - 如何在jsp中使用JSONArray变量
- javascript - 函数在代码大战中返回类型错误,不知道为什么
- node.js - 使用 gmail 使用 nodemailer 的最佳方式是什么?
- c++ - 如何在 Qt 中设置默认打印机?
- android - 如何为图层列表中的描边元素设置渐变颜色?