python - 产生带有反斜杠但不包括注释块的连接行

问题描述

目前正在尝试创建一个生成器函数，该函数一次生成一个文件行，同时忽略注释块并在末尾用反斜杠连接行与下一行。所以对于这个文本块：

# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line

理想的输出是：

<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>
<line5>
<line6>

这是我到目前为止的代码：

try:
    file_name = open('path/to/file.txt', 'r')
except FileNotFoundError:
    print("File could not be found. Please check spelling of file name!")
    sys.exit()

#Read lines in file
Lines = file_name.read().splitlines()

class FileLineGen:
    def get_filelines(path: str) -> Iterator[str]:
        for line in Lines:
            #Exclude a line if it starts with #
            if line.startswith("#"):
                line.replace(line, "")
                continue
            if "#" in line:
                #Split at where the # is located
                line.split('#')
                #Yield everything before the comment block
                yield line.split('#')[0]
                continue
            if line.endswith('\\'):
                #Yield everything but the backslash
                line = line[:-1]
                yield line
                continue
            #Yield the line in all other cases
            else:
                yield line

    gen = get_filelines(file_name)
    for line in Lines:
        print(next(gen))

这会产生以下输出：

<line0>
<line1>
<line2>
<line3.1 
line3.2 
line3.3>
<line4.1 
line4.2>
<line5>
more comment2>
<line6>
this line is part of the comment from the previous line

所以我已经能够删除反斜杠，但我尝试了各种连接但无济于事。理想的逻辑是首先将反斜杠与下一行连接起来，这样如果行的开头有一个 #，那么该行将被自动排除（并且尾随注释不会包含在输出中）。

编辑：使用 FileLineGen 类中的 with 块打开文件的新输出：

    with open('/path/to/file.txt') as f:
        for line in my_generator(f):
            print(line)

<line0>

<line1>
<line2>

<line3.1 line3.2 line3.3>

<line4.1 line4.2>

<line5>
<line6>

标签： pythongenerator

解决方案

您有两个运算符，#并且\. 后者优先于前者。这意味着您应该首先检查并处理它。这是一种使用列表作为缓冲区来构建行的简单方法：

def my_generator(f):
    buffer = []
    for line in f:
        line = line.rstrip('\n')
        if line.endswith('\\'):
            buffer.append(line[:-1])
            continue
        line = ''.join(buffer) + line
        buffer = []
        if '#' in line:
            line = line[:line.index('#')]
        if line:
            yield line

包装一个可迭代的行和使用鸭子类型的好处是你可以传入任何行为类似于字符串容器的东西，而不仅仅是一个文本文件：

text = """# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line'"""

for line in my_generator(text.splitlines()):
    print(line)

结果如预期：

<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>
<line5>
<line6>

编写该循环的另一种方法是

print('\n'.join(my_generator(text.splitlines())))

python - 产生带有反斜杠但不包括注释块的连接行

问题描述

解决方案

推荐阅读