python - MCQ 类型字符串的正则表达式
问题描述
如何从文本文档中提取多项选择题及其选项。每个问题都以数字和点开头。每个问题可以跨越多行,并且可能/可能没有句号或问号。我想制作一本带有问题编号和相应问题和选项的字典。我为此使用python。
17.
If you go on increasing the stretching force on a wire in a
guitar, its frequency.
(a)
increases
(b)
decreases
(c)
remains unchanged
(d)
None of these
some random text between questions
18.
A vibrating body
(a)
will always produce sound
(b)
may or may not produce sound if the amplitude of
vibration is low
(c)
will produce sound which depends upon frequency
(d)
None of these
19.
The wavelength of infrasonics in air is of the order of
(a)
100 m
(b)
101 m
(c)
10–1 m
(d)
10–2 m
解决方案
正则表达式:\d+\.([^(]+)
它得到数字,然后是一个点。
然后它捕获所有不是的东西(
(答案的开始)。
如果您不确定它是否那么容易,请在此处测试正则表达式。
Python代码:
import re # Imports the standard regex module
text_doc = """
17.
If you go on increasing the stretching force on a wire in a
guitar, its frequency.
(a)
increases
(b)
decreases
(c)
remains unchanged
(d)
None of these
some random text between questions
18.
A vibrating body
(a)
will always produce sound
(b)
may or may not produce sound if the amplitude of
vibration is low
(c)
will produce sound which depends upon frequency
(d)
None of these
19.
The wavelength of infrasonics in air is of the order of
(a)
100 m
(b)
101 m
(c)
10–1 m
(d)
10–2 m
"""
question_getter = re.compile('\\d+\\.([^(]+)')
print(question_getter.findall(text_doc))
编辑:但由于很多人在这里解析东西,我想我也会解析东西
获取可能答案的正则表达式:\([a-zA-Z]+\)\n(.+)
更新的 Python:
import re # Imports the standard regex module
text_doc = """
17.
If you go on increasing the stretching force on a wire in a
guitar, its frequency.
(a)
increases
(b)
decreases
(c)
remains unchanged
(d)
None of these
some random text between questions
18.
A vibrating body
(a)
will always produce sound
(b)
may or may not produce sound if the amplitude of
vibration is low
(c)
will produce sound which depends upon frequency
(d)
None of these
19.
The wavelength of infrasonics in air is of the order of
(a)
100 m
(b)
101 m
(c)
10–1 m
(d)
10–2 m
"""
question_getter = re.compile('\\d+\\.([^(]+)')
answer_getter = re.compile('\\([a-zA-Z]+\\)\\n(.+)')
# This is where the magical parsing happens
# It could've been organized differently
parsed = {question:answer_getter.findall(text_doc)
for question in question_getter.findall(text_doc)
}
print(parsed)
推荐阅读
- javascript - Discord Bot 初始化失败
- google-sheets - 当响应应该是谷歌表格中单元格值的总和时,如何解决“查询完成时输出为空”?
- android - 内容提供者仍然是合适的沟通方式吗?
- flutter - 来自flutter的dart-sdk问题
- bash - 您可以在查找中指定通配符表达式后的深度吗?
- reactjs - 如何在自定义 Typescript 库和主机上同时使用 i18next?
- javascript - 为什么我的按钮打印在背景下?
- sql - 如何根据来自不同表的值填充表
- scala - 在 scala/akka 的计算之间检查参与者的消息查询
- expression - [[:digit:]] 和 [0-9] 有什么区别?