首页 > 解决方案 > 我需要从文件中过滤或删除一些行

问题描述

这是输入文件,它的结构已经正确:

Name:  mr. Apple
class:  class 1
sub:  subject 1
ContactNo: 11111
Name:  mr. ball
class:  class  2
sub:  subject  2
ContactNo: 2222
Name:  mr. cat
class:  class 3
sub:  subject 3
ContactNo: 33333
class:  class 4
sub:  subject 4
ContactNo:44444
class:  class 5
sub:  subject 5
ContactNo: 55555
Name:  mr. tom
class:  class 9
sub:  subject 9
ContactNo: 99999

正如你所看到的,有一些细节没有名字。

例如:类:4 类子:主题 4 联系人编号:44444

我需要删除这些并只保留有名字的人的详细信息。

预期输出:

Name:  mr. Apple
class:  class 1
sub:  subject 1
ContactNo: 11111
Name:  mr. ball
class:  class  2
sub:  subject  2
ContactNo: 2222
Name:  mr. cat
class:  class 3
sub:  subject 3
ContactNo: 33333
Name:  mr. tom
class:  class 9
sub:  subject 9
ContactNo: 99999

我试过这个:

errors = []                       # The list where we will store results.
linenum = 0
substr = "Name:".lower()          # Substring to search for.
substr1 = "class:".lower()
substr2 = "sub:".lower()
substr3 = "ContactNo:".lower()

with open ('scrap.txt', 'rt') as myfile:
    for line in myfile:
        linenum += 1
        if line.lower().find(substr) != -1:    # if case-insensitive match,
            errors.append(line)
        elif  line.lower().find(substr1) != -1:        
            errors.append(line)
        elif  line.lower().find(substr2) != -1:     
            errors.append(line)
        elif  line.lower().find(substr3) != -1:      
            errors.append(line)

for err in errors:
    fp = open("rawextract.txt","a")
    fp.write(err)
    fp.close()
    print(err)

但我不知道如何丢弃不完整的行。

标签: pythonpython-3.xtext

解决方案


您可以使用re.findall与预期标题的正确结构序列匹配的正则表达式模式:

import re
with open('scrap.txt') as myfile:
    for m in re.findall('Name:.*\nclass:.*\nsub:.*\nContactNo:.*', myfile.read()):
        print(m)

这输出:

Name:  mr. Apple
class:  class 1
sub:  subject 1
ContactNo: 11111
Name:  mr. ball
class:  class  2
sub:  subject  2
ContactNo: 2222
Name:  mr. cat
class:  class 3
sub:  subject 3
ContactNo: 33333
Name:  mr. tom
class:  class 9
sub:  subject 9
ContactNo: 99999

推荐阅读