filter - 通过基于时间的过滤从 csv/txt 文件中删除行
问题描述
如何通过基于时间的过滤从 csv/txt 文件中删除行。删除时间段 09:01 到 16:00(第 3 列)之外的所有行?
第 3 列仅包含 hh:mm 格式的时间。
而第 2 列仅包含日期(dtype int64)。
没有标题。
时间 dtype 是对象。
我能够根据其他列进行过滤,但无法处理时间。
我的数据如下所示:
RTY,20200401,07:10,964.80,964.80,964.80,964.80,456,20
RTY,20200401,08:15,964.80,964.80,964.80,964.80,456,250
RTY,20200401,09:00,964.80,964.80,964.80,964.80,456,155
RTY,20200401,09:01,964.80,964.80,964.80,964.80,456,10
RTY,20200401,09:05,964.80,964.80,964.80,964.80,456,63
RTY,20200401,09:16,964.80,964.80,951.25,956.20,4587,159
RTY,20200401,09:17,956.20,957.25,953.10,955.15,4555,578
RTY,20200401,10:18,954.95,959.00,954.95,958.55,5121,951
RTY,20200401,12:19,958.50,960.00,956.50,959.20,3944,753
RTY,20200401,15:20,959.30,962.55,958.25,959.35,7071,258
RTY,20200401,15:30,960.00,960.00,956.15,956.15,2991,89
RTY,20200401,15:40,955.25,955.90,953.90,954.65,3812,574
RTY,20200401,16:00,955.25,955.90,953.90,954.65,3812,46
RTY,20200401,17:00,954.65,956.00,954.00,955.05,2775,654
RTY,20200401,18:00,954.65,956.00,954.00,955.05,2775,259
RTY,20200402,07:15,964.80,964.80,964.80,964.80,456,71
RTY,20200402,08:15,964.80,964.80,964.80,964.80,456,359
RTY,20200402,09:01,964.80,964.80,964.80,964.80,456,452
RTY,20200402,09:05,964.80,964.80,964.80,964.80,456,256
RTY,20200402,09:15,964.80,964.80,964.80,964.80,456,96
RTY,20200402,09:18,964.80,964.80,951.25,956.20,4587,754
RTY,20200402,09:55,956.20,957.25,953.10,955.15,4555,145
RTY,20200402,10:28,954.95,959.00,954.95,958.55,5121,252
RTY,20200402,12:49,958.50,960.00,956.50,959.20,3944,59
RTY,20200402,15:25,959.30,962.55,958.25,959.35,7071,745
RTY,20200402,15:30,960.00,960.00,956.15,956.15,2991,352
RTY,20200402,15:45,955.25,955.90,953.90,954.65,3812,621
RTY,20200401,16:00,950.25,959.90,950.90,951.65,3812,25
RTY,20200402,17:55,954.65,956.00,954.00,955.05,2775,48
RTY,20200402,18:00,954.65,956.00,954.00,955.05,2775,100
解决方案
这是一种将字符串转换为整数然后按值过滤的方法:
with open('example.txt','r') as file_handle:
example_file_content = file_handle.read().split("\n")
for line in example_file_content:
line_as_list = line.split(",")
# Delete all rows which lie outside time period 09:01 to 16:00
if not (int(line_as_list[2].split(":")[0])<9 or
int(line_as_list[2].split(":")[0])>16):
print(line)
更好的方法是将列转换为日期时间
import datetime
with open('example.txt','r') as file_handle:
example_file_content = file_handle.read().split("\n")
for line in example_file_content:
line_as_list = line.split(",")
# Delete all rows which lie outside time period 09:01 to 16:00
if not ((datetime.datetime.strptime(line_as_list[2], '%H:%M')<
datetime.datetime.strptime("09:01", '%H:%M')) or
(datetime.datetime.strptime(line_as_list[2], '%H:%M')>
datetime.datetime.strptime("16:00", '%H:%M'))):
print(line)
推荐阅读
- apache - 使用多个查询字符串重写规则
- r - 根据方程获取图
- c# - 如何获取目录中多个文件的 RSA 加密进度
- reactjs - T 的 Typescript 映射类型键,其中值是 V 类型
- javascript - 预期 T,在自定义 Object-Type 的 Vue-Prop 中得到对象
- flutter - 如何使用 cached_network_image 预加载图像?
- html - SVG 在 Firefox 中显示不正确
- python-3.x - 当我尝试处理熊猫中的缺失值时,某些方法不起作用
- react-native - 查看项目详细信息和 TypeError 时出错:无法读取未定义的属性“标题”-Expo React Native
- spring-boot - Spring Boot 2.4 及更高版本的 Mapstruct 和 Lombok 出现意外结果 此版本是否有任何解决方法或问题?