首页 > 解决方案 > 两个非常相似的正则表达式,其他找不到匹配

问题描述

我正在尝试从 JSON'ish 字符串匹配一个短名称字段(不再是正确的 JSON 格式,因此是正则表达式)。在这里运行正则表达式可能不是最有效的方法。我愿意接受建议,但我也想要原始问题的解决方案。

我正在使用 Python 2.7 和 Scrapy,运行 PyCharm 2018.2

我想要什么: 从充满餐馆的巨大 JSON'ish 文件中获取匹配项,将每个匹配项运行到列表中,迭代列表对象并收集不同的字段数据,我将其设置为变量以供将来使用。不过,我们在这里并没有走那么远。

我想匹配短名称字段,并从中提取值/数据。

下面的代码示例从已经接收到大文件的点开始(以 unicode 或字符串形式),我们开始匹配餐厅特定的数据字段。在实际模式中,我试图转义,而不是转义“和:”符号。

我有什么: Regex101(下)

我得到了我正在尝试修复的实际正则表达式,它最终以“NoneType 没有属性'组'”结尾。

请注意,第一行“模式”有效,并为我带来了我开始在 for 循环中经历的数据。我不相信问题出在那儿。

regex = re.compile(pattern, re.MULTILINE)
for match in regex.finditer(r.text):
  restaurant = match.group()
  restaurant = str(restaurant)
  print restaurant
  print type(restaurant)

  name = re.search(r'(?<=shortName\":\")(.*?)(?=\")',restaurant,re.MULTILINE 
  | re.DOTALL).group()

源样本:

156,"mainGroupId":1,"menuTypeId":1,"shopExternalId":"0001","displayName":"Lorem Ipsum","shortName":"I WANT THIS TEXT HERE","streetAddress":"BlankStreet 5","zip":"1211536","city":"Wonderland",

Testing regex, which works for a fixed source sample. NOTE: The source sample for this one was formatted with \ by regex101, as I first had every " and : escaped with . I copied this straight from their code generator, but it does work in code:

testregex = r'(?<=shortName\"\:\")(.*?)(?=\")'

test_str = (


156,\"mainGroupId\":1,\"menuTypeId\":1,\"shopExternalId\":\"0001\",\"displayName\":\"Lorem Ipsum\",\"shortName\":\"I CAN GET THIS MATCHED \",\"streetAddress\":\"BlankStreet 6\",\"zip\":\"2136481\",\"city\":\"Wonderland\")

matches = re.search(testregex, test_str, re.MULTILINE | re.DOTALL).group()
print matches
restaurantname = matches

What is the problem: The upper regex prints out the "'nonetype' object has no attribute 'group'"-error. The lower regex gets me the data I want, in this example it prints out "I CAN GET THIS MATCHED"

I am well aware that there might be small syntax problems, as I've been trying to fix this for some time.

Thank you in advance. The more detailed answer, the better. If you got different approach to the problem, please do give code so I can learn from it.

标签: pythonregexpython-2.7

解决方案


Your regex does not match your string. There is no shopID in the input.

You may get all your restaurant names directly with one re.findall call using the following regex:

shortName":"([^"]+)

See the regex demo. Details

  • shortName":" - a literal substring
  • ([^"]+) - Capturing group 1 (the result of the re.findall call will be the substrings captured into this Group): 1 or more chars other than ".

See Python demo:

import re
regex = re.compile(r'shortName":"([^"]+)')
print(regex.findall('156,"mainGroupId":1,"menuTypeId":1,"shopExternalId":"0001","displayName":"Lorem Ipsum","shortName":"I WANT THIS TEXT HERE","streetAddress":"BlankStreet 5","zip":"1211536","city":"Wonderland",'))

推荐阅读