首页 > 解决方案 > 为什么输出不符合我的代码条件?

问题描述

当我的限定符不均匀时,我在这里有这段代码来调整一些换行符。

linebreak = ''
with open(path) as f:
    for line in f:
        line1 = line.strip()

        if len(linebreak) > 0:
            linebreak = linebreak + ' ' + line1
            res = re.sub(r'"[^"]+"', lambda m: m.group(0).replace("\n", ""), linebreak)
            if (Counter(linebreak)['"'] % 2) == 0:
                linebreak = ''
                print(res)                  
        
        if (Counter(line1)['"'] % 2) != 0:
            nextline = next(f, None).strip()            
            linebreak = line1 + ' ' + nextline
            res = re.sub(r'"[^"]+"', lambda m: m.group(0).replace("\n", ""), linebreak)
            
            if (Counter(linebreak)['"'] % 2) == 0:
                linebreak = ''
                print(res)                
                
        if (Counter(line1)['"'] % 2) == 0: 
            print(line1)

问题是,我文件的最后一行没有打印出来,即使它与最后一个if 条件匹配。

文件:

"content of row 1 abcde" | abcde
abcde | "content of row 2
 continues here"
content of row 3 | abcde
"content of row 4 
 continues here" | "Test
ing"
Teste1

输出:

"content of row 1 abcde" | abcde
abcde | "content of row 2 continues here"
content of row 3 | abcde
"content of row 4 continues here" | "Test ing"

预期的:

"content of row 1 abcde" | abcde
abcde | "content of row 2 continues here"
content of row 3 | abcde
"content of row 4 continues here" | "Test ing"
Teste1

另外,我会很感激任何更简单的方法来修复这个换行符!

编辑:这条线在 GNU Linux 上完成了这项工作,但我想把它保留在 Python 上

gawk -v RS='"' 'NR % 2 == 0 { gsub(/[\r\n]+/, "") } { printf("%s%s", $0, RT) }' file > file.txt

标签: pythonregex

解决方案


一旦你有偶数",你应该继续下一行,你不必测试另一个ifscontinue所以,当线路完成时,你需要一个。

from collections import Counter
import re

linebreak = ''
path = "in.txt"
with open(path) as f:
    for line in f:
        line1 = line.strip()        

        if len(linebreak) > 0:            
            linebreak = linebreak + ' ' + line1
            res = re.sub(r'"[^"]+"', lambda m: m.group(0).replace("\n", ""), linebreak)
            if (Counter(linebreak)['"'] % 2) == 0:
                linebreak = ''
                print(res)
                continue  ### ADDED   LINE ###
        
        if (Counter(line1)['"'] % 2) != 0:            
            nextline = next(f, None).strip()            
            linebreak = line1 + ' ' + nextline
            res = re.sub(r'"[^"]+"', lambda m: m.group(0).replace("\n", ""), linebreak)
            
            if (Counter(linebreak)['"'] % 2) == 0:
                linebreak = ''
                print(res)               
                
        if (Counter(line1)['"'] % 2) == 0: 
            print(line1)

推荐阅读