python - 以多行的块读取文件
问题描述
我正在尝试以多行的块读取文件。例如,如果一个文件有 100 行,我希望每个块有 10 行,那么应该有 10 个块。然后应该能够提取块如下:
# Here 'read_chunk' function should return a generator.
for chunk in read_chunk(file_path="./file.txt", line_count=10):
print(chunk)
这就是我尝试的方式。
from typing import Generator
def read_chunk(
*,
file_path: str,
line_count: int = 10, # Number of chunked line.
) -> Generator[str, None, None]:
"""Read a file in chunks of 'line_count' lines."""
with open(file_path, "r") as f:
chunk = []
for idx, line in enumerate(f):
if line.strip():
chunk.append(line)
if not idx == 0 and idx % line_count == 0:
yield "\n".join(chunk)
chunk = []
# This returns the last chunk.
yield "\n".join(chunk)
让我们在以下文件上运行它:
# file.txt
* [What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example - Richard Peterson](https://www.guru99.com/database-normalization.html)
-> Normalization roughly means deduplication of data in a table by leveraging foreign keys, multiple tables, and intermediary join tables.
This article explains it in finer detail.
* [OLTP vs OLAP System](https://www.guru99.com/oltp-vs-olap.html)
-> OLTP is an online transactional system that manages database modification whereas OLAP is an online analysis and data retrieving process.
for chunk in read_chunk(file_path='./file.txt', line_count=2):
print('============\n') # This is to discern between the chunks better.
print(chunk)
print('============\n')
这将返回:
============
# file.txt
* [What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example - Richard Peterson](https://www.guru99.com/database-normalization.html)
============
============
-> Normalization roughly means deduplication of data in a table by leveraging foreign keys, multiple tables, and intermediary join tables.
This article explains it in finer detail.
============
============
* [OLTP vs OLAP System](https://www.guru99.com/oltp-vs-olap.html)
============
============
-> OLTP is an online transactional system that manages database modification whereas OLAP is an online analysis and data retrieving process.
============
输出在开始时看起来还不错,然后对我来说没有意义。不应该有一个带有 2 行的单个块,而不是两个带有 1 行的块吗?另外,有没有更好的方法来做到这一点?
解决方案
推荐阅读
- security - OAuth2 - 受信任的客户端能否通过客户端凭据流访问用户资源
- html - 为什么位置绝对按钮在较小的显示器上与我的内容重叠?
- django - swagger 中 POST 和 PUT 方法的有效负载
- r - 如何编织 coeftest() 回归输出?
- mongodb - mongorestore 未捕获的异常:语法错误
- css - 如何正确引用 CSS?
- winapi - CancelIo 并不代表实际取消的 I/O
- reactjs - Python 服务器和 React Native 客户端之间的 AES 加密
- python - Pandas:基于多个子字符串值进行过滤
- python - 在 Python DataFrame 中创建聚合另一列的新列