首页 > 解决方案 > 如何有效地编写给定的循环?

问题描述

编写以下循环的任何有效方法?dataPLprocessed 是一个时间序列数据,我想根据滚动 7 天的百分位值计算分数(有关更多说明,请参见下面的循环)。

for i in len(dataPLprocessed):
    if (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i]<.05) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i]>.95) :
        dataPLprocessed['score'] =10
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .1)or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .9):
        dataPLprocessed['score'] = 9
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .15) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .85):
        dataPLprocessed['score'] = 8
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .2) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .8):
        dataPLprocessed['score'] = 7
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .25)or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .75):
        dataPLprocessed['score'] = 6
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .3)or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .7):
        dataPLprocessed['score'] = 5
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .35) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .65):
        dataPLprocessed['score'] = 4
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .4) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .6):
        dataPLprocessed['score'] = 3
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .45) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .55):
        dataPLprocessed['score'] = 2
    else:
        dataPLprocessed['score'] = 1

标签: pythonpandastime-series

解决方案


这可能有助于避免重复的数据访问代码来获得排名值:

for i in len(dataPLprocessed):
    rank = dataPLprocessed.rolling('7D')['lineardifference'].rank(pct=True)[i]
    if   rank < 0.05 or rank > 0.95: score = 10
    elif rank < 0.1  or rank > 0.9:  score = 9
    elif rank < 0.15 or rank > 0.85: score = 8
    elif rank < 0.2  or rank > 0.8:  score = 7
    elif rank < 0.25 or rank > 0.75: score = 6
    elif rank < 0.3  or rank > 0.7:  score = 5
    elif rank < 0.35 or rank > 0.65: score = 4
    elif rank < 0.4  or rank > 0.6:  score = 3
    elif rank < 0.45 or rank > 0.55: score = 2
    else:                            score = 1
    dataPLprocessed['score'] = score

如果这仍然不够改进,您可能会通过使用二进制搜索来计算分数来减少几毫秒:

from bisect import bisect_left, bisect_right
loRanks  = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45]
hiRanks  = [0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
def getScore(rank):
    if rank<0.45: return 10-bisect_right(loRanks,rank)
    else:         return 1+bisect_left(hiRanks,rank)


for i in len(dataPLprocessed):
    rank = dataPLprocessed.rolling('7D')['lineardifference'].rank(pct=True)[i]
    dataPLprocessed['score'] =  getScore(rank)

推荐阅读