python - 从python中的txt文件中提取特定文本
问题描述
我最近拿起 python 来做一些文本提取。我有一个如下所示的数据集:
@article{noauthor_collective_nodate,
title = {Collective teacher efficacy},
abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
@article{noauthor_collective_nodate,
title = {Collective teacher efficacy},
abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
}
@article{noauthor_initial_nodate,
title = {Initial teacher education programs},
abstract = {Overview Influence: Initial teacher education programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have small positive impact Influence Definition: Initial teacher education or {ITEs} (sometimes at the undergraduate level and sometimes at the post-graduate level) is the entry-level qualification for teaching in numerous countries, including the United States. More recently, there are school-based {ITEs}, non-accredited {ITEs}, and many online {ITE} programs. Evidence Number of meta-analyses: 5 Number of studies: 117 Number of students: 106,016 Number of effects: 509 Effect size: 0.10},
}
@article{noauthor_professional_nodate,
title = {Professional development programs},
abstract = {Overview Influence: Professional development programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have positive impact Influence Definition: Professional development relates to courses or interventions aimed to enhance the beliefs, actions, impact of knowledge of teachers and school leaders. Evidence Number of meta-analyses: 21 Number of studies: 1,151 Number of students: 2,321,242 Number of effects: 2,938 Effect size: 0.37},
keywords = {Program Development},
}
我想从这篇文章中提取标题和摘要的一部分。我设法使用以下代码提取了我想要的输出:
s = "@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}"
start = s.find("title = {") + len("title = {")
end = s.find("}, abstract")
start2 = s.find("Influence Definition: ") + len("Influence Definition: ")
end2 = s.find("Evidence Number of meta-analyses:")
substring = s[start:end]
substring2 = s[start2:end2]
print(substring+' - '+substring2+";")
输出:
Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. ;
问题是:
- 这只取出第一个搜索结果
- 我希望能够在原始文本文件上运行它,而不是将其复制为“s”。
有人可以伸出援助之手吗?
解决方案
这应该这样做:
with open("myfile.txt", "r") as f:
s = f.readlines()
for x in s:
if x.__contains__("title"):
start = x.find("title = {") + len("title = {")
end = x.find("}")
substring = x[start:end] + " - "
if x.__contains__("Influence Definition"):
start = x.find("Influence Definition: ") + len("Influence Definition: ")
end = x.find("Evidence Number of meta-analyses:")
substring += x[start:end]
print(substring)
print()
f.close()
例如,如果您的文件名为 myfile.txt,则会打印以下内容:
集体教师效能 - 一组教师在特定教育环境中的共同信念,即他们拥有对学生成绩产生积极影响的技能。
集体教师效能 - 一组教师在特定教育环境中的共同信念,即他们拥有对学生成绩产生积极影响的技能。
初级教师教育计划 - 初级教师教育或 {ITEs}(有时在本科水平,有时在研究生水平)是包括美国在内的许多国家的教学入门级资格。最近,有以学校为基础的 {ITEs}、未经认可的 {ITEs} 和许多在线 {ITE} 计划。
专业发展计划 - 专业发展涉及旨在增强教师和学校领导的信念、行动和知识影响的课程或干预措施。
推荐阅读
- python - 改变 Pandas df 的格式
- java - 将 Oracle 自定义类型传递给存储过程
- testing - 当我使用“从现有案例中插入”添加新案例时,两个案例都是链接的
- cuda - 运行时错误:使用不匹配版本编译的 cuda 扩展用于编译 pytorch - Conda 环境
- amazon-web-services - 放大 S3 图像 - 404 未找到
- javascript - 如何从 API 访问结果以显示到我的 UI?(反应原生)
- windows - 无法联系到 pgAdmin 4 服务器:
- graphdb - 如何查看 CONSTRUCT 查询的查询计划?
- azure-devops - Azure Devops - 管理、运行和跟踪一次性 Sql 脚本
- kotlin - Kotlin 中的双管道运算符是什么?