首页 > 解决方案 > 如何根据日期时间过滤 csv 文件?

问题描述

我有一个看起来像这样的 csv 文件(显然实际上它要大得多):

1,$1,AA,GG,DD,2020-01-01T00:01:10.740+02:00
2,$2,A1,FD,HH,2020-01-01T00:02:00.240+02:00
3,$3,1A,PP,LL,2020-01-01T00:03:30.460+02:00
4,$4,S1,LL,SS,2020-02-01T00:01:11.190+02:00
5,$5,2G,PP,FF,2020-01-01T00:04:20.320+02:00
6,$6,5S,LL,TT,2020-02-01T01:02:15.180+02:00

我需要取第一行,取那个日期并检查其余行是否等于那一天以及0:00:00.000小时和23:59:59.999小时之间。所以说起来简单..我把第一行日期内的所有行都取了。

这是我想要的结果:

1,$1,AA,GG,DD,2020-01-01T00:01:10.740+02:00
2,$2,A1,FD,HH,2020-01-01T00:02:00.240+02:00
3,$3,1A,PP,LL,2020-01-01T00:03:30.460+02:00
5,$5,2G,PP,FF,2020-01-01T00:04:20.320+02:00

这是我的代码:

root = r'c:\data\FF\Desktop\my_files\file01.txt'

with open(root, 'r') as my_file:
    reader = csv.reader(my_file)
        
def filter_row():
    for row in reader:
        date_time = row[5]   #<--- extract the datetime 
        fdate_time = datetime.strptime(date_time, '%Y-%m-%dT%H:%M:%S.%f%z') #<--- make a datetime object of it
        x = fdate_time.date() #<--- extract the y/m/d

        begin_time = datetime.strptime(x + '00:00.00+02:00','%Y-%m-%dT%H:%M:%S.%f%z') #<--- fix the start time of a day
        end_time = datetime.strptime(x + '23:59:59.999+02:00', '%Y-%m-%dT%H:%M:%S.%f%z') #<--- fix the end time of a day
        
        filtered_records = fdate_time >= begin_time and fdate_time <= end_time #<filter everything between the start and end time
        
    return filtered_records
        
filter_row() 
 

当我运行上面的代码时,我收到:

  File "C:\data\FF\Desktop\Python\My_python\Filter_csv.py", line 82, in filter_row
    for row in reader:

ValueError: I/O operation on closed file.

我真的失去了它,因为我不知道如何解决这个问题。我寻找了多种解决方案,但找不到任何解决方案。希望有人能告诉我它是如何工作的。谢谢你们。

标签: pythoncsvdatetime

解决方案


提供的上下文管理with确保资源在块结束时被释放。这意味着应该在集团内部阅读所有内容。with

一个简单的方法是参数化函数:

root = r'c:\data\FF\Desktop\my_files\file01.txt'

def filter_row(reader):
    for row in reader:
        ...            
    return filtered_records

with open(root, 'r') as my_file:
    reader = csv.reader(my_file)
    filter_row(reader)

但:

  • 您应该使用该datetime.replace方法来计算一天的开始和结束,而不是使用字符串
  • 如果要将这些行写入新文件,则应更改filter row为生成器:
root = r'c:\data\FF\Desktop\my_files\file01.txt'
newf= r'c:\data\FF\Desktop\my_files\file01.csv'

def filter_row(reader):
    first = True
    for row in reader:
        date_time = row[5]  # <--- extract the datetime
        fdate_time = datetime.strptime(date_time, '%Y-%m-%dT%H:%M:%S.%f%z')  # <--- make a datetime object of it

        if first:         # special processing for the first line
            first = False
            begin_time = fdate_time.replace(hour=0, minute=0, second=0, microsecond=0) # <--- fix the start time of a day
            end_time = fdate_time.replace(hour=23, minute=59, second=59, microsecond=999999) # <--- fix the end time of a day
            yield row      # yield first row
        elif fdate_time >= begin_time and fdate_time <= end_time:  # <filter everything between the start and end time
            yield row      # and rows of same date

with open(root) as my_file, open(newf, 'w', newline=None) as new_file:
    reader = csv.reader(my_file)
    writer = csv.writer(new_file)

    writer.writerows(filter_row(reader))

推荐阅读