首页 > 解决方案 > Remove specific references like author from scientific publication

问题描述

I saw this question before: python removing references from a scientific paper It is similar to what I want to do but I still cannot figure out how to do it exactly suppose my string I have a reference like this for example: Poteete et al. (2010) How can I remove it from the string using regex in python?

What I have tried is similar with the previous question but maybe I forgot sth:

sentence = "Moreover, we elaborate on how these methods have led to improved insights into the theoretical framework proposed by  Poteete et al. (2010)"
sentence = re.sub(r'(?:[\w \.])+[0-9]{4}','',sentence)

Any idea for this? Thank you so much for the help.

标签: pythonregexstring

解决方案


If the name starts with an uppercase char A-Z:

[A-Z]\w*(?: +\w+)*\. \(\d{4}\)
  • [A-Z]\w* Match a char A-Z and optional word char
  • (?: +\w+)* Optionally repeat 1+ spaces and 1+ word chars
  • \. Match .
  • \(\d{4}\) Match 4 digits between parenthesis

Instead of matching spaces, you could also use \s but that can also match a newline.

Regex demo

import re
 
sentence = "Moreover, we elaborate on how these methods have led to improved insights into the theoretical framework proposed by  Poteete et al. (2010)"
sentence = re.sub(r'[A-Z]\w*(?: +\w+)*\. \(\d{4}\)', '', sentence)
print (sentence)

Output

Moreover, we elaborate on how these methods have led to improved insights into the theoretical framework proposed by  

推荐阅读