python - 某种方式以相反的顺序逐行读取文本文件?
问题描述
我想逐行阅读下面给出的反向文本文件。我不想使用readlines()
or read()
。
一个.txt
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
预期结果:
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
我的解决方案:
with open('a.txt') as lines:
for line in reversed(lines):
print(line)
解决方案
这是一种无需将整个文件一次全部读入内存的方法。它确实需要首先读取整个文件,但只存储每行开始的位置。一旦知道了,它就可以使用该seek()
方法以任何所需的顺序随机访问每个。
这是使用您的输入文件的示例:
# Preprocess - read whole file and note where lines start.
# (Needs to be done in binary mode.)
with open('text_file.txt', 'rb') as file:
offsets = [0] # First line is always at offset 0.
for line in file:
offsets.append(file.tell()) # Append where *next* line would start.
# Now reread lines in file in reverse order.
with open('text_file.txt', 'rb') as file:
for index in reversed(range(len(offsets)-1)):
file.seek(offsets[index])
size = offsets[index+1] - offsets[index] # Difference with next.
# Read bytes, convert them to a string, and remove whitespace at end.
line = file.read(size).decode().rstrip()
print(line)
输出:
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
更新
这是一个执行相同操作但使用 Pythonmmap
模块对文件进行内存映射的版本,该文件应该通过利用您的操作系统/硬件的虚拟内存功能来提供更好的性能。
这是因为,正如PyMOTW-3所说:
内存映射通常会提高 I/O 性能,因为它不涉及每次访问的单独系统调用,并且不需要在缓冲区之间复制数据——内核和用户应用程序都直接访问内存。
代码:
import mmap
with open('text_file.txt', 'rb') as file:
with mmap.mmap(file.fileno(), length=0, access=mmap.ACCESS_READ) as mm_file:
# First preprocess the file and note where lines start.
# (Needs to be done in binary mode.)
offsets = [0] # First line is always at offset 0.
for line in iter(mm_file.readline, b""):
offsets.append(mm_file.tell()) # Append where *next* line would start.
# Now process the lines in file in reverse order.
for index in reversed(range(len(offsets)-1)):
mm_file.seek(offsets[index])
size = offsets[index+1] - offsets[index] # Difference with next.
# Read bytes, convert them to a string, and remove whitespace at end.
line = mm_file.read(size).decode().rstrip()
print(line)
推荐阅读
- c++ - 为什么在实例化对象时会调用两次构造函数
- vue.js - 如何用Vue中下拉列表的选定值填充对象
- python - 我尝试打印时缺少字符串(response.text)
- aws-glue - 是否可以在不使用爬虫的情况下直接读取 AWS Glue 中的固定长度文件?
- android - 我可以将 Instagram 实时会话链接链接到我的应用程序中的对象吗?
- javascript - 根据 json 数据生成字段 - 如何解决选择/复选框
- multithreading - Delphi XE4 + Indy TCP服务器:大量线程
- r - 短语匹配,无论它们的位置如何,用逗号分隔
- c# - 将只读导航属性配置为构造函数参数
- ajax - IR 列过滤器(请求 ajax 插件)返回“您的会话已结束错误” - Oracle Apex v21.1