python - 两个非常相似的正则表达式,其他找不到匹配
问题描述
我正在尝试从 JSON'ish 字符串匹配一个短名称字段(不再是正确的 JSON 格式,因此是正则表达式)。在这里运行正则表达式可能不是最有效的方法。我愿意接受建议,但我也想要原始问题的解决方案。
我正在使用 Python 2.7 和 Scrapy,运行 PyCharm 2018.2
我想要什么: 从充满餐馆的巨大 JSON'ish 文件中获取匹配项,将每个匹配项运行到列表中,迭代列表对象并收集不同的字段数据,我将其设置为变量以供将来使用。不过,我们在这里并没有走那么远。
我想匹配短名称字段,并从中提取值/数据。
下面的代码示例从已经接收到大文件的点开始(以 unicode 或字符串形式),我们开始匹配餐厅特定的数据字段。在实际模式中,我试图转义,而不是转义“和:”符号。
我有什么: Regex101(下)
我得到了我正在尝试修复的实际正则表达式,它最终以“NoneType 没有属性'组'”结尾。
请注意,第一行“模式”有效,并为我带来了我开始在 for 循环中经历的数据。我不相信问题出在那儿。
regex = re.compile(pattern, re.MULTILINE)
for match in regex.finditer(r.text):
restaurant = match.group()
restaurant = str(restaurant)
print restaurant
print type(restaurant)
name = re.search(r'(?<=shortName\":\")(.*?)(?=\")',restaurant,re.MULTILINE
| re.DOTALL).group()
源样本:
156,"mainGroupId":1,"menuTypeId":1,"shopExternalId":"0001","displayName":"Lorem Ipsum","shortName":"I WANT THIS TEXT HERE","streetAddress":"BlankStreet 5","zip":"1211536","city":"Wonderland",
Testing regex, which works for a fixed source sample. NOTE: The source sample for this one was formatted with \ by regex101, as I first had every " and : escaped with . I copied this straight from their code generator, but it does work in code:
testregex = r'(?<=shortName\"\:\")(.*?)(?=\")'
test_str = (
156,\"mainGroupId\":1,\"menuTypeId\":1,\"shopExternalId\":\"0001\",\"displayName\":\"Lorem Ipsum\",\"shortName\":\"I CAN GET THIS MATCHED \",\"streetAddress\":\"BlankStreet 6\",\"zip\":\"2136481\",\"city\":\"Wonderland\")
matches = re.search(testregex, test_str, re.MULTILINE | re.DOTALL).group()
print matches
restaurantname = matches
What is the problem: The upper regex prints out the "'nonetype' object has no attribute 'group'"-error. The lower regex gets me the data I want, in this example it prints out "I CAN GET THIS MATCHED"
I am well aware that there might be small syntax problems, as I've been trying to fix this for some time.
Thank you in advance. The more detailed answer, the better. If you got different approach to the problem, please do give code so I can learn from it.
解决方案
Your regex does not match your string. There is no shopID
in the input.
You may get all your restaurant names directly with one re.findall
call using the following regex:
shortName":"([^"]+)
See the regex demo. Details
shortName":"
- a literal substring([^"]+)
- Capturing group 1 (the result of there.findall
call will be the substrings captured into this Group): 1 or more chars other than"
.
See Python demo:
import re
regex = re.compile(r'shortName":"([^"]+)')
print(regex.findall('156,"mainGroupId":1,"menuTypeId":1,"shopExternalId":"0001","displayName":"Lorem Ipsum","shortName":"I WANT THIS TEXT HERE","streetAddress":"BlankStreet 5","zip":"1211536","city":"Wonderland",'))
推荐阅读
- java - 如何使用itext7.1在原始内容下添加文字水印
- sql - 如何在 sqldeveloper 中过滤日期时间的数据库
- datatables - 如何在 Angular 7 的单个组件文件中引入具有分页、排序、过滤的多数据表
- tomcat - 在 Tomcat 上调用 request.authenticate() 时出现缓冲区溢出错误
- c# - How to get the brightest area from webcam in Emgu CV C#?
- swift - 我需要检查 UIAlertAction 并避免关闭 UIAlertController
- python-3.x - Spyder:如何在当前单元格之上运行所有单元格?
- php - Google Calendar API 将新活动标记为“已取消”
- angular6-json-schema-form - 如何访问 angular6 json 模式表单库核心元模式
- python-3.x - 无法卸载名为 `-umpy` 的软件包