首页 > 解决方案 > 我将如何以不同的数量抵消 pandas 的数据列?

问题描述

我正在pandas使用matplotlib. 我有一个绘图策略,即我将初始值归零,然后将每个选择的变量偏移一个设定的量。例如,这是我当前的绘图方法:

fig, ax = plt.subplots()
# data is in a dataframe called inputData
timeseries_plots=['var1','var3','var8']
offsetFactor = 20

for ii,var in enumerate(timeseries_plots)
    offsetRef = inputData[var].loc[~inputData[var].isnull()].iloc[0]
    ax.plot(inputData[TimeIndex], offsetFactor*(len(timeseries_plots_avg)-ii-1)+inputData[timeseries_plots_avg[ii]]-offsetRef, label=var,markersize=1,marker='None',linestyle = 'solid',color=colour)
plt.show()

这会产生类似这样的东西(有一些matplotlib技巧): 恒定偏移值

如您所见,它删除了offsetRef(在这种情况下是变量的初始值),然后offsetFactor为每个变量添加一个常量(在这种情况下等于 20)。结果是开始垂直偏移 20 的线。

但是,当值开始随时间漂移并且一个变量可能与另一个变量交叉时,这可能会成为问题。我想做的是重置垂直偏移量 - 例如通过将 offsetRef 更改超过某个日期。

我试图通过以下方式做到这一点。我首先初始化一个等于变量大小的数组。然后我用offsetRef重新计算的resetDates. #PSEUDOCODE我已经在我粗略地写下我想做的事情的地方添加了注释- 但提前抱歉它们非常粗糙。先感谢您!

fig, ax = plt.subplots()
inputData = pd.DataFrame(np.random.randint(100, size=(100, 5)), columns=['timestamp','var2','var3','var4','var5'])
inputData['timestamp'][:]=pd.date_range('2020-may-01','2020-aug-08')
timeseries_plots=['var1','var3','var4']
offsetFactor = 20
resetDates = ['2020-jun-23','2020-jul-05']

for ii,var in enumerate(timeseries_plots)
    offsetRef = np.zeros(inputData[var].size)
    for tt,ttdate in enumerate(resetDates):
        if tt=0:
        #PSEUDO CODE: offsetRef[ inputData['timestamp'] <resetDates[tt]] = inputData[var].loc[~inputData[var].isnull()].iloc[0]
        #PSEUDO CODE: offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].loc[~inputData[var].isnull()].iloc[ttdate]
    #PSEUDO CODE: offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].loc[~inputData[var].isnull()].iloc[ttdate]
    
    ax.plot(inputData[TimeIndex], offsetFactor*(len(timeseries_plots_avg)-ii-1)+inputData[timeseries_plots_avg[ii]]-offsetRef, label=var,markersize=1,marker='None',linestyle = 'solid',color=colour)
plt.show()

标签: pythonpython-3.xpandasmatplotlib

解决方案


这是我将保留在这里的当前解决方案,以便对其他人有用:

fig, ax = plt.subplots()
# set up df
inputData = pd.DataFrame(np.random.randint(100, size=(100, 5)), columns=['timestamp','var2','var3','var4','var5'])
inputData['timestamp'][:]=pd.date_range('2020-may-01','2020-aug-08')
inputData['var2']=np.arange(0,100,1)
inputData['var2'][0:3]=49
inputData['var4']=np.arange(0,200,2)
inputData['var2'][0:3]=np.nan
# set constants and settings
dispFactor=20
timeseries_plots=['var2','var4']
resetDates=['2020-05-05','2020-05-20', '2020-08-04']
offsetFactor = dispFactor
#begin
fig, ax=plt.subplots()
for ii,var in enumerate(timeseries_plots):
    offsetRef = np.zeros(inputData[var].size)
    for tt,ttdate in enumerate(resetDates):
        if tt==0:        
            if inputData[var].loc[inputData['timestamp']==ttdate].isna().bool(): #if date is nan
                print('a',inputData[var].loc[~inputData[var].isnull()].iloc[0],inputData[var].bfill().loc[inputData['timestamp']==ttdate])
                offsetRef[(inputData['timestamp']<ttdate)]= inputData[var].loc[~inputData[var].isnull()].iloc[0]
                offsetRef[(inputData['timestamp']>=ttdate)]=inputData[var].bfill().loc[inputData['timestamp']==ttdate]
            else:
                print('b',inputData[var].loc[~inputData[var].isnull()].iloc[0],inputData[var].loc[inputData['timestamp']==ttdate])
                offsetRef[(inputData['timestamp']<ttdate)]= inputData[var].loc[~inputData[var].isnull()].iloc[0]
                offsetRef[(inputData['timestamp']>=ttdate)]= inputData[var].loc[inputData['timestamp']==ttdate]
        else:
            if inputData[var].loc[inputData['timestamp']==ttdate].isna().bool(): #if date is nan
                print('c')
                offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].bfill().loc[inputData['timestamp']==ttdate]
            else:
                print('d',inputData[var].loc[inputData['timestamp']==ttdate])
                offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].loc[inputData['timestamp']==ttdate]
        print(offsetRef)
    ax.plot(inputData['timestamp'], offsetFactor*(len(timeseries_plots)-ii-1)+inputData[var]-offsetRef)

plt.show()

这会将所选偏移量“重置”为 20,resetDates以生成下图: 重置偏移量的示例

我可能不需要 nan 数据的 if-logic 捕获(并且只依赖.bfill())在任何一种情况下都可以工作 - 但这让我觉得它更安全。我将在改进解决方案时进行编辑。


推荐阅读