首页 > 解决方案 > 如何使用时间序列中的值创建时间间隔数据框?

问题描述

我有一个 CSV 文件,其中包含一长串血糖 (BG) 值和相关时间戳。我正在尝试使用 BG < 3.5 的所有间隔创建一个数据框(最后称为 df2)。我可以使用这些值创建初始 df,然后得到:

    Timestamp   Glucose
0   2020-02-24 17:45:23 4.7
1   2020-02-24 17:50:23 4.9
2   2020-02-24 17:55:22 4.9
3   2020-02-24 18:00:22 4.8
4   2020-02-24 18:05:21 4.7
... ... ...
2348    2020-03-03 19:25:38 4.8
2349    2020-03-03 19:30:38 4.7
2350    2020-03-03 19:35:38 4.7
2351    2020-03-03 19:40:38 4.5
2352    2020-03-03 19:45:38 4.2
2353 rows × 2 columns

然后我使用下面的代码来尝试生成间隔。然而,它只给了我持续 5​​ 分钟的间隔(1 个值长度)。我认为这是因为我有代码index+1来关闭我的代码,current_interval而我需要的是一个看起来的循环,index+1 to index+len(time_series)但我不知道如何做到这一点。非常感谢任何帮助。下面的代码:

THRESHOLD = 3.5

IntervalRow = namedtuple(
    'IntervalRow',
    ['start_time', 'start_bg', 'end_time', 'end_bg', 'lowest_bg']
)

def is_hypo(value):
  return value < THRESHOLD

def calculate_hypo_intervals(time_series):
    intervals = []
    current_interval = None

    for index in range(len(time_series)):
        if is_hypo(time_series['Glucose'][index]):
            if not current_interval:
                    current_interval = IntervalRow(
                    start_time=time_series['Timestamp'][index],
                    start_bg=time_series['Glucose'][index],
                    end_time=None,
                    end_bg=None,
                    lowest_bg=time_series['Glucose'][index],
                        
                )
            
           
            if index+1 < len(time_series) and current_interval.lowest_bg > time_series['Glucose'][index+1]:
                current_interval = IntervalRow(
                    start_time=current_interval.start_time,
                    start_bg=current_interval.start_bg,
                    end_time=None,
                    end_bg=None,
                    lowest_bg=time_series['Glucose'][index+1],
                                               
                                            
              )
      
            
            if index+1 < len(time_series) and not is_hypo(time_series['Glucose'][index+1]):
                intervals.append(
                    IntervalRow(
                        start_time=current_interval.start_time,
                        start_bg=current_interval.start_bg,
                        end_time=time_series['Timestamp'][index+1], 
                        end_bg=time_series['Glucose'][index+1],
                        lowest_bg=current_interval.lowest_bg,
                    )
                )

# I appreciate this bit is probably not very code savvy and is only there for the final data point.
# suggestions to mix it with the if loop above welcomed. Reason I seperated it was because if I 
# left it as before where it read "if index == len(time_series) and not is_hypo" then either all 
# intervals have to end with a value that is still hypo or you get an Index error
            if index+1 == len(time_series):
              intervals.append(
                    IntervalRow(
                        start_time=current_interval.start_time,
                        start_bg=current_interval.start_bg,
                        end_time=time_series[index].timestamp,
                        end_bg=time_series['Glucose'][index],
                        lowest_bg=current_interval.lowest_bg
                    )
                )

            current_interval = None
                
    df2 = pd.DataFrame(intervals, columns =['Start Time', 'Start BG', 'End Time', 'End BG', 'Lowest BG'])

    return df2

这给了我以下信息,但不包括(例如)比第一个间隔更早的非常低的 BG < 3.5 插曲。如您所见,所有间隔只有 5 分钟(下一个值)。谢谢!!


Start Time  Start BG    End Time    End BG  Lowest BG
0   2020-02-25 10:10:23 3.1 2020-02-25 10:15:24 3.6 3.1
1   2020-02-25 11:05:23 3.4 2020-02-25 11:10:23 3.7 3.4
2   2020-02-25 14:35:25 3.1 2020-02-25 14:40:25 3.5 3.1
3   2020-02-25 18:25:26 3.3 2020-02-25 18:30:26 3.9 3.3
4   2020-02-27 09:45:20 3.4 2020-02-27 09:50:20 3.6 3.4
5   2020-02-27 12:50:19 3.4 2020-02-27 12:55:19 3.6 3.4
6   2020-02-27 17:35:20 3.4 2020-02-27 17:40:19 3.6 3.4
7   2020-02-28 10:05:22 3.4 2020-02-28 10:10:22 3.5 3.4
8   2020-02-28 18:35:23 3.4 2020-02-28 18:40:24 3.6 3.4
9   2020-02-29 11:15:26 3.4 2020-02-29 11:20:26 3.5 3.4
10  2020-02-29 16:15:27 3.4 2020-02-29 16:20:27 3.5 3.4
11  2020-02-29 21:10:28 3.4 2020-02-29 21:15:27 3.5 3.4
12  2020-03-01 13:55:31 3.4 2020-03-01 14:00:30 3.6 3.4
13  2020-03-01 17:45:29 3.4 2020-03-01 17:50:31 3.5 3.4
14  2020-03-02 12:45:34 3.3 2020-03-02 12:50:34 3.6 3.3
15  2020-03-02 16:30:34 3.4 2020-03-02 16:35:34 3.5 3.4
16  2020-03-03 17:50:38 3.4 2020-03-03 17:55:38 3.5 3.4

标签: pythonpandasdataframetime-seriesintervals

解决方案


推荐阅读