python - 产生带有反斜杠但不包括注释块的连接行
问题描述
目前正在尝试创建一个生成器函数,该函数一次生成一个文件行,同时忽略注释块并在末尾用反斜杠连接行与下一行。所以对于这个文本块:
# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line
理想的输出是:
<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>
<line5>
<line6>
这是我到目前为止的代码:
try:
file_name = open('path/to/file.txt', 'r')
except FileNotFoundError:
print("File could not be found. Please check spelling of file name!")
sys.exit()
#Read lines in file
Lines = file_name.read().splitlines()
class FileLineGen:
def get_filelines(path: str) -> Iterator[str]:
for line in Lines:
#Exclude a line if it starts with #
if line.startswith("#"):
line.replace(line, "")
continue
if "#" in line:
#Split at where the # is located
line.split('#')
#Yield everything before the comment block
yield line.split('#')[0]
continue
if line.endswith('\\'):
#Yield everything but the backslash
line = line[:-1]
yield line
continue
#Yield the line in all other cases
else:
yield line
gen = get_filelines(file_name)
for line in Lines:
print(next(gen))
这会产生以下输出:
<line0>
<line1>
<line2>
<line3.1
line3.2
line3.3>
<line4.1
line4.2>
<line5>
more comment2>
<line6>
this line is part of the comment from the previous line
所以我已经能够删除反斜杠,但我尝试了各种连接但无济于事。理想的逻辑是首先将反斜杠与下一行连接起来,这样如果行的开头有一个 #,那么该行将被自动排除(并且尾随注释不会包含在输出中)。
编辑:使用 FileLineGen 类中的 with 块打开文件的新输出:
with open('/path/to/file.txt') as f:
for line in my_generator(f):
print(line)
<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>
<line5>
<line6>
解决方案
您有两个运算符,#
并且\
. 后者优先于前者。这意味着您应该首先检查并处理它。这是一种使用列表作为缓冲区来构建行的简单方法:
def my_generator(f):
buffer = []
for line in f:
line = line.rstrip('\n')
if line.endswith('\\'):
buffer.append(line[:-1])
continue
line = ''.join(buffer) + line
buffer = []
if '#' in line:
line = line[:line.index('#')]
if line:
yield line
包装一个可迭代的行和使用鸭子类型的好处是你可以传入任何行为类似于字符串容器的东西,而不仅仅是一个文本文件:
text = """# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line'"""
for line in my_generator(text.splitlines()):
print(line)
结果如预期:
<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>
<line5>
<line6>
编写该循环的另一种方法是
print('\n'.join(my_generator(text.splitlines())))
推荐阅读
- mysql - 错误“无法添加外键约束”而不尝试创建外键
- scikit-learn - 多标签问题中的 RandomForestClassifier - 它是如何工作的?
- keras - 对象检测 - 在训练期间忽略特定图像区域
- asp.net - 如何使用vb在网页上画一条线
- windows - 修复 findstr 以忽略与路径匹配的文件名
- python-3.x - 根据条件更新特定的行数
- selenium - 使用 Zalenium(可扩展的硒网格)运行 dockerized Behat BDD 测试
- azure - Azure 函数 CosmosDb 绑定性能
- sql-server - 我在生成特定类型的数据时遇到问题
- sql - 比较 DB2 的 SQL 中的日期和时间