regex - 如何使用带有正则表达式的 linux sed 在文本文件中查找和删除长模式
问题描述
我正在将很多 bibtex 文件解析为 R 以进行一些数据分析。但是,摘要会定期引起问题,我想事先使用sed
.
我已经找到sed 's/Abstract\s\=\s[{][{]//' < file.bib
成功删除 Abstract 条目和
sed 's/[}][}]\,//' < file.bib
删除右括号和逗号。
但是,我无法以任何方式将两者结合起来以删除两者之间的所有内容。例如通过尝试:
sed 's/^Abstract\s\=\s[{][{][\s\S]*[}][}]\,$//' < file.bib
这是 bibtex 参考的样子:
@article{ ISI:000072671200001,
Author = {Edmondson, A and Moingeon, B},
Title = {{From organizational learning to the learning organization}},
Journal = {{MANAGEMENT LEARNING}},
Year = {{1998}},
Volume = {{29}},
Number = {{1}},
Pages = {{5-20}},
Month = {{MAR}},
Abstract = {{This article reviews theories of organizational learning and presents a
framework with which to organize the literature. We argue that unit of
analysis provides one critical distinction in the organizational
learning literature and research objective provides another. The
resulting two-by-two matrix contains four categories of research, which
we have called: (2) residues (organizations as residues of past
learning); (2) communities (organizations as collections of individuals
who can learn and develop); (3) participation (organizational
improvement gained through intelligent activity of individual members),
and (4) accountability (organizational improvement gained through
developing individuals' mental models). We also propose a distinction
between the terms organizational learning and the learning organization.
Our subsequent analysis identifies relationships between disparate parts
of the literature and shows that these relationships point to individual
mental models as a critical source of leverage for creating learning
organizations. A brief discussion of the work of two of the most visible
researchers in this field, Peter Senge and Chris Argyris, provides
additional support for this type of change strategy.}},
DOI = {{10.1177/1350507698291001}},
ISSN = {{1350-5076}},
Unique-ID = {{ISI:000072671200001}},
}
这就是我希望它的样子:
@article{ ISI:000072671200001,
Author = {Edmondson, A and Moingeon, B},
Title = {{From organizational learning to the learning organization}},
Journal = {{MANAGEMENT LEARNING}},
Year = {{1998}},
Volume = {{29}},
Number = {{1}},
Pages = {{5-20}},
Month = {{MAR}},
DOI = {{10.1177/1350507698291001}},
ISSN = {{1350-5076}},
Unique-ID = {{ISI:000072671200001}},
}
解决方案
您可以尝试将 sed 命令按顺序传递给彼此。像这样的东西:
sed 's/Abstract\s\=\s[{][{]//' < file.bib | sed 's/[}][}]\,//'
您也可以尝试在您的模式中使用 OR Regex 运算符,例如:
sed 's/Abstract\s\=\s[{][{]|[}][}]\,//' < file.bib
其中任何一个都应该工作。我希望这有帮助。