python - Separate blocks of text python
问题描述
I am wondering how one could separate the blocks of text within the same text file. The example is below. Basically I have 2 items, one goes from "Channel 9" to the line with "Brief:..", the other one starts with "Southern ..." to again, the "Brief" line. How does one go about separating them into 2 text files with python? I reckon the common divider would be "(female 16+)". Many thanks!
Channel 9 (1 item)
A woman selling her caravan near Bendigo has been left
$1,100 out
hosted by Peter Hitchener
A woman selling her caravan near Bendigo has been left $1,100 out of
pocket after an elderly couple made the purchase with counterfeit money.
The wildlife worker tried to use the notes to pay for a house deposit, but an
agent noticed the notes were missing the Coat of Arms on one side.
Brief: Radio & TV
Demographics: 153,000 (male 16+) • 177,000 (female
16+)
Southern Cross Victoria Bendigo (1 item)
Heathcote Police are warning the residents to be on the
lookout a
hosted by Jo Hall
Heathcote Police are warning the residents to be on the lookout after a large
dash of fake $50 note was discovered. Victim Marianne Thomas was given
counterfeit notes from a caravan. The Heathcote resident tried to pay the
house deposit and that's when the counterfeit notes were spotted. Thomas
says the caravan is in town for the Spanish Festival.
Brief: Radio & TV
Demographics: 4,000 (male 16+) • 3,000 (female 16+)
解决方案
这是我最近做的类似事情的修改示例,基本上遍历您的文本并逐行复制。核心逻辑基于附加到当前文件名,在找到新部分后重置。将使用下一节的第一行作为文件名。
#!/usr/bin/env python
import re
data = """
Channel 9 (1 item)
A woman selling her caravan near Bendigo has been left $1,100 out hosted by
Peter Hitchener A woman selling her caravan near Bendigo has been left $1,100
out of pocket after an elderly couple made the purchase with counterfeit money.
The wildlife worker tried to use the notes to pay for a house deposit, but an
agent noticed the notes were missing the Coat of Arms on one side.
Brief: Radio & TV Demographics: 153,000 (male 16+) • 177,000 (female 16+)
Southern Cross Victoria Bendigo (1 item)
Heathcote Police are warning the residents to be on the lookout a hosted by Jo
Hall Heathcote Police are warning the residents to be on the lookout after a
large dash of fake $50 note was discovered. Victim Marianne Thomas was given
counterfeit notes from a caravan. The Heathcote resident tried to pay the house
deposit and that's when the counterfeit notes were spotted. Thomas says the
caravan is in town for the Spanish Festival.
Brief: Radio & TV Demographics: 4,000 (male 16+) • 3,000 (female 16+)
"""
current_file = None
for line in data.split('\n'):
# Set initial filename
if current_file == None and line != '':
current_file = line + '.txt'
# This is to handle the blank line after Brief
if current_file == None:
continue
text_file = open(current_file, "a")
text_file.write(line + "\n")
text_file.close()
# Reset filename if we have finished this section
# which is idenfitied by:
# starts with Brief - ^Brief
# contains some random amount of text - .*
# ends with ) - )$
if re.match(r'^Brief:.*\)$', line) is not None:
current_file = None
这将输出以下文件
Channel 9 (1 item).txt
Southern Cross Victoria Bendigo (1 item).txt
推荐阅读
- pandas - 没有填充的熊猫 to_string?
- javascript - Vue app中axios调用后如何使用条件验证
- php - laravel 表达式无法转换为数字报错如何解决?
- c# - 使用子查询值过滤父结果,但仍返回与父项相关的所有子项
- solr - Apache Solr - 在不重新启动 solr 服务的情况下更新默认 log4j2.xml 文件
- java - 用“非限定”名称调用函数是什么意思?
- php - 使用 PHP 为 Google Cloud Storage 创建 Pub/Sub 通知
- javascript - Gatsby 的 StaticImage 未在 Storybook 中呈现
- ios - 如何在 XCUITests 中等待复制/粘贴提示消失?
- python - 通过值列表获取键