首页 > 解决方案 > 按条件对系列切片数据集

问题描述

我有数据集:

data = {'host': ['A','A','A','A','A','A','B','B','B','B','B','B'],
       'TS': ['1','2', '3', '7', '9','11','7','8','9','14','16', '18'], 
       'Predict' : ['None','None', '134','None','None', '127','None','None', '121','None','None', '124']}

我想按非无值系列拆分数据集并获取该系列的时间差。

我有时差功能。并尝试为系列提取索引,但不知道它是如何使用的

def timediffs(series):
    series['tdiff'] = series['ts'].diff().fillna(0.0)
    return series
predict_index = df.index.where(df['Predict'].notna()).to_series().bfill()

最后,我想得到这样的数据集:

new_data = {'host': ['A','A','A','A','A','A','B','B','B','B','B','B'],
       'TS': ['1','2', '3', '7', '9','11','7','8','9','14','16', '19'], 
       'Predict' : ['None','None', '134','None','None', '127','None','None', '121','None','None', '124'],
        'Time_diff' : ['0','1','1','0','2','2', '0','1','1','0','2','3',],
        'New_predict' : ['134','134','134','127','127','127','121','121','121','124','124','124',]
       }

new_df = pd.DataFrame(new_data)

标签: pythonpandasnumpy

解决方案


首先我们替换'None'NaN. 然后我们backfill (bfill)用来制作我们的专栏,New_predict最后我们GroupBy.diff用来获取Time_diff

df['New_predict'] = df.replace('None', np.NaN).loc[:, 'Predict'].bfill()
# df['TS'] = df['TS'].astype(int)
df['Time_diff'] = df.groupby('New_predict')['TS'].diff().fillna(0)

   host  TS Predict New_predict  Time_diff
0     A   1    None         134        0.0
1     A   2    None         134        1.0
2     A   3     134         134        1.0
3     A   7    None         127        0.0
4     A   9    None         127        2.0
5     A  11     127         127        2.0
6     B   7    None         121        0.0
7     B   8    None         121        1.0
8     B   9     121         121        1.0
9     B  14    None         124        0.0
10    B  16    None         124        2.0
11    B  18     124         124        2.0

推荐阅读