首页 > 解决方案 > Python - 如何在 io.BufferedReader 中使用自定义 buffer_size?

问题描述

据我了解,buffer_sizeto 的参数io.BufferedReader应该控制传递给底层读取器的读取缓冲区大小。

但是,我没有看到这种行为。相反,当我reader.read()使用整个文件时,io.DEFAULT_BUFFER_SIZEbuffer_size被忽略。当 I reader.read(length),length用作缓冲区大小时,该buffer_size参数再次被忽略。

最小的例子:

import io

class MyReader(io.RawIOBase):

    def __init__(self, length):
        self.length = length
        self.position = 0

    def readinto(self, b):
        print('read buffer length: %d' % len(b))
        length = min(len(b), self.length - self.position)
        self.position += length
        b[:length] = 'a' * length
        return length

    def readable(self):
        return True

    def seekable(self):
        return False


print('# read entire file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read()))

print('\n# read part of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read(10000)))

print('\n# read beyond end of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print 'output length: %d' % len(reader.read(30000))

输出:

# read entire file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
output length: 20000

# read part of file file
read buffer length: 10000
output length: 10000

# read beyond end of file file
read buffer length: 30000
read buffer length: 10000
output length: 20000

我是否误解了 BufferedReader 应该如何工作?

标签: pythonpython-2.7io

解决方案


的要点BufferedIOReader是保留一个内部缓冲区,然后设置该缓冲区的大小。该缓冲区用于满足较小的读取,以避免在较慢的 I/O 设备上进行多次读取调用。

但是,缓冲区不会尝试限制读取的大小!

io.BufferedIOReader文档中:

从该对象读取数据时,可能会从底层原始流中请求大量数据,并将其保存在内部缓冲区中。然后可以在后续读取时直接返回缓冲的数据。

该对象继承自io.BufferedIOBase,其中指出:

与方法的主要区别RawIOBase在于read(),将尝试(分别)根据请求读取尽可能多的输入或消耗所有给定的输出,代价是可能会进行多个系统调用。readinto()write()

因为你调用.read()了对象,所以从被包装的对象中读取更大的块来读取所有数据到最后。实例保存的内部缓冲区在BufferedIOReader()这里没有发挥作用,毕竟您要求了所有数据。

如果您阅读较小的块,缓冲区将发挥作用:

>>> reader = io.BufferedReader(MyReader(2048), buffer_size=512)
>>> __ = reader.read(42)  # initial read, fill buffer
read buffer length: 512
>>> __ = reader.read(123)  # within the buffer, no read to underlying file needed
>>> __ = reader.read(456)  # deplete buffer, another read needed to re-fill
read buffer length: 512
>>> __ = reader.read(123)  # within the buffer, no read to underlying file needed
>>> __ = reader.read()     # read until end, uses larger blocks to read from wrapped file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192

推荐阅读