python - python3试图在\ x0c上分割字符串
问题描述
我正在将 PDF 中的文本提取为字符串文本:
text = "● A justification of your prediction, including the following information that helped form\n\no Angle of the sun relative to the surface on September 22, 2021\no Materials of the surface (include three materials) and heat absorption\n\ncharacteristics\n\no Length of exposure of the surface to the sun (i.e., the amount of time the surface\n\nhas had to warm on that day), including slopes of the stadium and a consideration\nof the angles of the seats\n\n1 Yes, I know that’s a Wednesday but just go with it…\n\n\x0c● Sources: Be sure to include in-text citations as appropriate as well as provide a list of\n\nsources that were used for your report, use MLA or APA citation style\n\n● Your report can assume any format you chose, and should be between 300-400 words in\n\nlength\n\nResources:\n\n"
我想将此文本拆分为“\x0c”。我试过 re.split(r'[\x0c]+', text) 但这只是删除了“\x0c”,它不会分裂。同样, text.splitlines() 也没有成功。
我错过了什么?
解决方案
普通旧有什么问题
text.split("\x0c")
? 这给了我一个包含两个元素的列表,看起来就像你在这里想要的。
如果需要,您可以进一步按行拆分:
sections = [x.split("\n") for x in text.split("\x0c")]
推荐阅读
- swift - Alamofire.upload SwiftLint 违规
- node.js - 如何使用 JWT 令牌显示当前登录用户的用户名
- python - 如何使用 Youtube API v3 按类别过滤视频?
- css - IE11 和 CSS 网格
- javascript - 迭代器 getter 不可调用
- javascript - 根据选中的复选框调整 jquery 过滤脚本以在页面重新加载时运行
- php - 5分钟前获取插入数据库的消息的功能
- c++11 - 剥离 .gnu.version 部分时的段错误
- angular - Angular HttpClient 获取方法订阅无法读取未定义的属性“长度”
- swift - 在字典中使用数组