python - Remove specific references like author from scientific publication
问题描述
I saw this question before: python removing references from a scientific paper It is similar to what I want to do but I still cannot figure out how to do it exactly suppose my string I have a reference like this for example: Poteete et al. (2010) How can I remove it from the string using regex in python?
What I have tried is similar with the previous question but maybe I forgot sth:
sentence = "Moreover, we elaborate on how these methods have led to improved insights into the theoretical framework proposed by Poteete et al. (2010)"
sentence = re.sub(r'(?:[\w \.])+[0-9]{4}','',sentence)
Any idea for this? Thank you so much for the help.
解决方案
If the name starts with an uppercase char A-Z:
[A-Z]\w*(?: +\w+)*\. \(\d{4}\)
[A-Z]\w*
Match a char A-Z and optional word char(?: +\w+)*
Optionally repeat 1+ spaces and 1+ word chars\.
Match.
\(\d{4}\)
Match 4 digits between parenthesis
Instead of matching spaces, you could also use \s
but that can also match a newline.
import re
sentence = "Moreover, we elaborate on how these methods have led to improved insights into the theoretical framework proposed by Poteete et al. (2010)"
sentence = re.sub(r'[A-Z]\w*(?: +\w+)*\. \(\d{4}\)', '', sentence)
print (sentence)
Output
Moreover, we elaborate on how these methods have led to improved insights into the theoretical framework proposed by
推荐阅读
- php - PHP表单未连接到数据库
- mysql - 将数据保存在数据库中或将数据保存在文件中哪个更合适?
- javascript - 获取将正则表达式与数据属性匹配的行 ID
- php - WordPress 初始 HTTP 请求(视口)需要很长时间
- java - 我在用 java 构建我的 ui 时遇到问题
- insert - DB2 在存储过程中插入数据时出现问题
- c# - 什么是编译器警告 CS1723“XML 注释具有引用类型参数的 cref 属性‘T’”?
- java - Postgres SQL Select 语句的执行在 8 次尝试后减慢了 10 倍
- c# - 使用 NReco 将 HTML 转换为 PDF 时没有分页符
- c - 为什么可以使用字符串文字来初始化 unsigned char 数组,但不能初始化 unsigned char 指针?