首页 > 解决方案 > 通过基于时间的过滤从 csv/txt 文件中删除行

问题描述

如何通过基于时间的过滤从 csv/txt 文件中删除行。删除时间段 09:01 到 16:00(第 3 列)之外的所有行?

第 3 列仅包含 hh:mm 格式的时间。

而第 2 列仅包含日期(dtype int64)。

没有标题。

时间 dtype 是对象。

我能够根据其他列进行过滤,但无法处理时间。

我的数据如下所示:

RTY,20200401,07:10,964.80,964.80,964.80,964.80,456,20
RTY,20200401,08:15,964.80,964.80,964.80,964.80,456,250
RTY,20200401,09:00,964.80,964.80,964.80,964.80,456,155
RTY,20200401,09:01,964.80,964.80,964.80,964.80,456,10
RTY,20200401,09:05,964.80,964.80,964.80,964.80,456,63
RTY,20200401,09:16,964.80,964.80,951.25,956.20,4587,159
RTY,20200401,09:17,956.20,957.25,953.10,955.15,4555,578
RTY,20200401,10:18,954.95,959.00,954.95,958.55,5121,951
RTY,20200401,12:19,958.50,960.00,956.50,959.20,3944,753
RTY,20200401,15:20,959.30,962.55,958.25,959.35,7071,258
RTY,20200401,15:30,960.00,960.00,956.15,956.15,2991,89
RTY,20200401,15:40,955.25,955.90,953.90,954.65,3812,574
RTY,20200401,16:00,955.25,955.90,953.90,954.65,3812,46
RTY,20200401,17:00,954.65,956.00,954.00,955.05,2775,654
RTY,20200401,18:00,954.65,956.00,954.00,955.05,2775,259
RTY,20200402,07:15,964.80,964.80,964.80,964.80,456,71
RTY,20200402,08:15,964.80,964.80,964.80,964.80,456,359
RTY,20200402,09:01,964.80,964.80,964.80,964.80,456,452
RTY,20200402,09:05,964.80,964.80,964.80,964.80,456,256
RTY,20200402,09:15,964.80,964.80,964.80,964.80,456,96
RTY,20200402,09:18,964.80,964.80,951.25,956.20,4587,754
RTY,20200402,09:55,956.20,957.25,953.10,955.15,4555,145
RTY,20200402,10:28,954.95,959.00,954.95,958.55,5121,252
RTY,20200402,12:49,958.50,960.00,956.50,959.20,3944,59
RTY,20200402,15:25,959.30,962.55,958.25,959.35,7071,745
RTY,20200402,15:30,960.00,960.00,956.15,956.15,2991,352
RTY,20200402,15:45,955.25,955.90,953.90,954.65,3812,621
RTY,20200401,16:00,950.25,959.90,950.90,951.65,3812,25
RTY,20200402,17:55,954.65,956.00,954.00,955.05,2775,48
RTY,20200402,18:00,954.65,956.00,954.00,955.05,2775,100

标签: filterrows

解决方案


这是一种将字符串转换为整数然后按值过滤的方法:

with open('example.txt','r') as file_handle:
    example_file_content = file_handle.read().split("\n")

for line in example_file_content:
    line_as_list = line.split(",")
    # Delete all rows which lie outside time period 09:01 to 16:00
    if not (int(line_as_list[2].split(":")[0])<9 or
        int(line_as_list[2].split(":")[0])>16):
        print(line)

更好的方法是将列转换为日期时间

import datetime

with open('example.txt','r') as file_handle:
    example_file_content = file_handle.read().split("\n")

for line in example_file_content:
    line_as_list = line.split(",")
    # Delete all rows which lie outside time period 09:01 to 16:00
    if not ((datetime.datetime.strptime(line_as_list[2], '%H:%M')<
             datetime.datetime.strptime("09:01", '%H:%M')) or 
            (datetime.datetime.strptime(line_as_list[2], '%H:%M')>
             datetime.datetime.strptime("16:00", '%H:%M'))):
        print(line)

推荐阅读