首页 > 解决方案 > python3试图在\ x0c上分割字符串

问题描述

我正在将 PDF 中的文本提取为字符串文本:

text = "● A justification of your prediction, including the following information that helped form\n\no Angle of the sun relative to the surface on September 22, 2021\no Materials of the surface (include three materials) and heat absorption\n\ncharacteristics\n\no Length of exposure of the surface to the sun (i.e., the amount of time the surface\n\nhas had to warm on that day), including slopes of the stadium and a consideration\nof the angles of the seats\n\n1 Yes, I know that’s a Wednesday but just go with it…\n\n\x0c● Sources: Be sure to include in-text citations as appropriate as well as provide a list of\n\nsources that were used for your report, use MLA or APA citation style\n\n● Your report can assume any format you chose, and should be between 300-400 words in\n\nlength\n\nResources:\n\n"

我想将此文本拆分为“\x0c”。我试过 re.split(r'[\x0c]+', text) 但这只是删除了“\x0c”,它不会分裂。同样, text.splitlines() 也没有成功。

我错过了什么?

标签: pythonsplit

解决方案


普通旧有什么问题

text.split("\x0c")

? 这给了我一个包含两个元素的列表,看起来就像你在这里想要的。

如果需要,您可以进一步按行拆分:

sections = [x.split("\n") for x in text.split("\x0c")]

推荐阅读