python - Python - 从并行的多个大文件中读取并单独生成它们
问题描述
我有多个大文件,需要逐行生成它们循环样式。像这样的伪代码:
def get(self):
with open(file_list, "r") as files:
for file in files:
yield file.readline()
我该怎么做?
解决方案
该itertools
文档有几个配方,其中一个非常简洁的循环配方。我还会使用ExitStack
多个文件上下文管理器:
from itertools import cycle, islice
from contextlib import ExitStack
# https://docs.python.org/3.8/library/itertools.html#itertools-recipes
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
num_active = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while num_active:
try:
for next in nexts:
yield next()
except StopIteration:
# Remove the iterator we just exhausted from the cycle.
num_active -= 1
nexts = cycle(islice(nexts, num_active))
...
def get(self):
with open(files_list) as fl:
filenames = [x.strip() for x in fl]
with ExitStack() as stack:
files = [stack.enter_context(open(fname)) for fname in filenames]
yield from roundrobin(*files)
虽然,也许最好的设计是使用控制反转,并提供文件对象的序列作为参数.get
,所以调用代码应该注意使用退出堆栈:
class Foo:
...
def get(self, files):
yield from roundrobin(*files)
# calling code:
foo = Foo() # or however it is initialized
with open(files_list) as fl:
filenames = [x.strip() for x in fl]
with ExitStack() as stack:
files = [stack.enter_context(open(fname)) for fname in filenames]
for line in foo.get(files):
do_something_with_line(line)
推荐阅读
- python - 我应该如何解决这个 utf-8 编码错误?
- lua - Lua 是否有类似于头文件的东西?
- r - 是否有一个 R 函数可以从前两个相邻值中获取结果?
- java - Java JPA ORM 一对多 多对一
- rust - 选项,and_then() 和元组
- c# - 获取时间范围内特定日期的数据,时间范围有微小的差异
- django - 在查询表达式中使用 django 多态类型信息
- javascript - 如何修复错误 Uncaught TypeError: $(...).datetimepicker is not a function in script
- html - 使用 CSS 创建特定的聊天气泡形状
- reactjs - cypress - TypeError 无法设置未定义的属性“状态”