首页 > 解决方案 > 以多行的块读取文件

问题描述

我正在尝试以多行的块读取文件。例如,如果一个文件有 100 行,我希望每个块有 10 行,那么应该有 10 个块。然后应该能够提取块如下:

# Here 'read_chunk' function should return a generator.
for chunk in read_chunk(file_path="./file.txt", line_count=10):
    print(chunk)

这就是我尝试的方式。

from typing import Generator

def read_chunk(
    *,
    file_path: str,
    line_count: int = 10,  # Number of chunked line.
) -> Generator[str, None, None]:

    """Read a file in chunks of 'line_count' lines."""

    with open(file_path, "r") as f:
        chunk = []
        for idx, line in enumerate(f):
            if line.strip():
                chunk.append(line)

            if not idx == 0 and idx % line_count == 0:
                yield "\n".join(chunk)
                chunk = []

        # This returns the last chunk.
        yield "\n".join(chunk)

让我们在以下文件上运行它:

# file.txt

* [What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example - Richard Peterson](https://www.guru99.com/database-normalization.html) 
-> Normalization roughly means deduplication of data in a table by leveraging foreign keys, multiple tables, and intermediary join tables. 
This article explains it in finer detail.

* [OLTP vs OLAP System](https://www.guru99.com/oltp-vs-olap.html) 
-> OLTP is an online transactional system that manages database modification whereas OLAP is an online analysis and data retrieving process.


for chunk in read_chunk(file_path='./file.txt', line_count=2):
    print('============\n')      # This is to discern between the chunks better.
    print(chunk)
    print('============\n')

这将返回:

============

# file.txt

* [What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example - Richard Peterson](https://www.guru99.com/database-normalization.html)

============

============

-> Normalization roughly means deduplication of data in a table by leveraging foreign keys, multiple tables, and intermediary join tables.

This article explains it in finer detail.

============

============

* [OLTP vs OLAP System](https://www.guru99.com/oltp-vs-olap.html)

============

============

-> OLTP is an online transactional system that manages database modification whereas OLAP is an online analysis and data retrieving process.

============

输出在开始时看起来还不错,然后对我来说没有意义。不应该有一个带有 2 行的单个块,而不是两个带有 1 行的块吗?另外,有没有更好的方法来做到这一点?

标签: pythonfilebuffer

解决方案


推荐阅读