首页 > 解决方案 > Delete certain text pattern in python

问题描述

I'm trying to get rid of a certain pattern of text in my .txt file, which looks something like:


mystring = '''

example deletion words
in the first block

First sentence to keep.

example deletion words
in the second block

Second sentence to keep.

example deletion words
in the third block

Third sentence to keep.

example deletion words
in the fourth block'''

My desired output would look like:


"First sentence to keep.

Second sentence to keep.

Third sentence to keep."


So what I'm trying to do is get rid of all text between the strings "example" and "block", including the strings themselves. Any idea how I would go about that in either R or Python?


Sorry for forgetting to include my attempt with regex and just asking out of the blue and thanks to the people who took the effort to answer regardless. My working solution using regex and re package in python:

import re

cleanedtext = re.sub('\nexample.*?block','',mystring, flags=re.DOTALL)

print(cleanedtext)

标签: pythonrtext-processing

解决方案


在 R 中,您可以使用str_remove_allfromstringr

stringr::str_remove_all(string, "example.*block")
 #[1] " First sentence to keep.\nSecond sentence to keep.\nThird sentence to keep.\n"

这是简写

stringr::str_replace_all(string, "example.*block", "")

数据

string <- "example deletion words in the first block First sentence to keep.
           example deletion words in the second blockSecond sentence to keep.
           example deletion words in the third blockThird sentence to keep.
           example deletion words in the fourth block"

推荐阅读