首页 > 解决方案 > Separate blocks of text python

问题描述

I am wondering how one could separate the blocks of text within the same text file. The example is below. Basically I have 2 items, one goes from "Channel 9" to the line with "Brief:..", the other one starts with "Southern ..." to again, the "Brief" line. How does one go about separating them into 2 text files with python? I reckon the common divider would be "(female 16+)". Many thanks!


Channel 9 (1 item)

A woman selling her caravan near Bendigo has been left 
$1,100 out
hosted by Peter Hitchener
A woman selling her caravan near Bendigo has been left $1,100 out of 
pocket after an elderly couple made the purchase with counterfeit money. 
The wildlife worker tried to use the notes to pay for a house deposit, but an 
agent noticed the notes were missing the Coat of Arms on one side. 


Brief: Radio & TV
Demographics: 153,000 (male 16+) • 177,000 (female 
16+)

Southern Cross Victoria Bendigo (1 item)


Heathcote Police are warning the residents to be on the 
lookout a
hosted by Jo Hall
Heathcote Police are warning the residents to be on the lookout after a large 
dash of fake $50 note was discovered. Victim Marianne Thomas was given 
counterfeit notes from a caravan. The Heathcote resident tried to pay the 
house deposit and that's when the counterfeit notes were spotted. Thomas 
says the caravan is in town for the Spanish Festival.


Brief: Radio & TV
Demographics: 4,000 (male 16+) • 3,000 (female 16+)

标签: pythontextblock

解决方案


这是我最近做的类似事情的修改示例,基本上遍历您的文本并逐行复制。核心逻辑基于附加到当前文件名,在找到新部分后重置。将使用下一节的第一行作为文件名。

#!/usr/bin/env python
import re

data = """
Channel 9 (1 item)

A woman selling her caravan near Bendigo has been left $1,100 out hosted by
Peter Hitchener A woman selling her caravan near Bendigo has been left $1,100
out of pocket after an elderly couple made the purchase with counterfeit money.
The wildlife worker tried to use the notes to pay for a house deposit, but an
agent noticed the notes were missing the Coat of Arms on one side.

Brief: Radio & TV Demographics: 153,000 (male 16+) • 177,000 (female 16+)

Southern Cross Victoria Bendigo (1 item)

Heathcote Police are warning the residents to be on the lookout a hosted by Jo
Hall Heathcote Police are warning the residents to be on the lookout after a
large dash of fake $50 note was discovered. Victim Marianne Thomas was given
counterfeit notes from a caravan. The Heathcote resident tried to pay the house
deposit and that's when the counterfeit notes were spotted. Thomas says the
caravan is in town for the Spanish Festival.

Brief: Radio & TV Demographics: 4,000 (male 16+) • 3,000 (female 16+)
"""



current_file = None
for line in data.split('\n'):

    # Set initial filename
    if current_file == None and line != '':
        current_file = line + '.txt'

    # This is to handle the blank line after Brief
    if current_file == None:
        continue

    text_file = open(current_file, "a")
    text_file.write(line + "\n")
    text_file.close()

    # Reset filename if we have finished this section
    # which is idenfitied by:
    #    starts with Brief - ^Brief
    #    contains some random amount of text - .*
    #    ends with ) - )$
    if re.match(r'^Brief:.*\)$', line) is not None:
        current_file = None

这将输出以下文件

Channel 9 (1 item).txt
Southern Cross Victoria Bendigo (1 item).txt

推荐阅读