首页 > 解决方案 > 根据序列号和故障标签定义故障范围

问题描述

我想在实际故障 (1) 前 3 天将健康标签 (0) 重新标记为故障标签 (1),就像他们在附加链接中所做的那样:参考链接。它适用于相同的时间长度,但不适用于可变长度。也就是说,所有序列号必须在同一天失败,这是没有意义的。对于样本数据集,我们看到序列 C 在 2014 年 1 月 5 日失败,A 在 1 月 6 日失败,A 在 1 月 7 日失败。我想将重新标记的健康标签 (0) 重新标记为失败标签 (1)实际故障前 3 天 (1) 对于序列号 C,对于其他序列号也是如此。我很感激你的时间。谢谢!

我的代码:

import pandas as pd
import numpy as np
import datetime
from datetime import date, timedelta
df = pd.read_excel('/content/failure.xlsx') 
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by="date")
d = datetime.timedelta(days = 3)
a = []
for ind in df.index:
  if df['failure'][ind] == 1:
    sdate = df['date'][ind] - d
    edate = df['date'][ind]
    delta = edate - sdate 
    for i in range(delta.days + 1):
      day = sdate + timedelta(days=i)
      a.append(day)
mylist = list(dict.fromkeys(a))
mylist = pd.to_datetime(mylist,format='%y/%m/%d')
new_value = 1
for ind in df.index:
  for item in mylist:
    if df['date'][ind].date() ==  pd.to_datetime(item).date():
       df['failure'][ind] = 1

在此处输入图像描述

标签: pythonpandasdatetime

解决方案


这是一个有效的解决方案。注意:为了清楚起见,我更愿意完全重写它。

import pandas as pd
import numpy as np
import datetime
from datetime import date, timedelta
df = pd.read_excel('content/failure.xlsx') 
df['date'] = pd.to_datetime(df['date'])

df = df.sort_values(by="date").reindex_like(df) # We need to rebuild the index after sorting
d = datetime.timedelta(days = 3)

failed=df[df['failure']==1] # disks that actually failed

for hdd in failed['serial_number']: # for each hdd
    ind=df.index[df['serial_number']==hdd] # look for that patricular hdd in df
    # Note: the last element of ind corresponds to the failure date for hdd
    failure_date=df.iloc[ind[-1],1] # [rows,column==1] --> (:,'date')
    for i in ind[:-1]:
        if (failure_date - df.iloc[i,1]).days <= d.days:
            df.iloc[i,3]=1 # set failure to 1
            

print('hdd: A')
print(df[df['serial_number']=='A'])

print('hdd: C')
print(df[df['serial_number']=='C'])

print('hdd: H')
print(df[df['serial_number']=='H'])

导致:

hdd: A
   model       date serial_number  failure  smart5  smart187
1      M 2014-01-01             A        0       0        60
5      M 2014-01-02             A        0       0       140
7      M 2014-01-03             A        1       0       180
11     M 2014-01-04             A        1       0       260
12     M 2014-01-05             A        1       0       280
16     M 2014-01-06             A        1       0       360
hdd: C
   model       date serial_number  failure  smart5  smart187
2      M 2014-01-01             C        0       0        80
4      M 2014-01-02             C        1       0       120
8      M 2014-01-03             C        1       0       200
10     M 2014-01-04             C        1       0       240
13     M 2014-01-05             C        1       0       300
hdd: H
   model       date serial_number  failure  smart5  smart187
0      M 2014-01-01             H        0       0        40
3      M 2014-01-02             H        0       0       100
6      M 2014-01-03             H        0       0       160
9      M 2014-01-04             H        1       0       220
14     M 2014-01-05             H        1       0       320
15     M 2014-01-06             H        1       0       340
17     M 2014-01-07             H        1       0       400

推荐阅读