python - 使用python对文件中的给定模式执行正则表达式操作

问题描述

我正在读取一个包含 60K JSON 的文件，格式如下：

{ "log": [
       {"code": "abc",
         "refs": ["a":"b"]
       }
]}
{ "log": [
       {"code": "xyz",
         "refs": ["p":"q", "x": ["abc","xyz"] ]
       }
]}

我需要使用正则表达式执行 3 个操作：

1. "[" at start of file
2.  "]" at the end of file
3. Find pattern: ]}{ "log": [  and add comma in it ]},{ "log": [

注意：该模式在每个字符之间有空行和空格。此模式中不存在其他特殊字符或字母。

我的输出文件应该是：

[{ "log": [
       {"code": "abc",
         "refs": ["a":"b"]
       }
]},
{ "log": [
       {"code": "xyz",
         "refs": ["p":"q", "x": ["abc","xyz"] ]
       }
]}]

蟒蛇代码：

f = open('C:/Users/Desktop/SampleTestFiles/logfile.json',"r+")
s = f.read()
s = '[' + s + ']' # This does not works. Brackets are added to end of file.

标签： pythonregex

对于包含多个简单连接在一起的 JSON 对象的文本文件的情况（即没有将它们放入列表中，因此,在 JSON 编码对象之间丢失），以下内容可能有助于纠正该问题（不考虑格式错误在别处编码；来自问题的输入已被修改为仅解决提问者请求的内容）：

>>> import re
>>> import json
>>> s = """
... { "log": [
...        {"code": "abc",
...          "refs": {"a":"b"}
...        }
... ]}
... { "log": [
...        {"code": "xyz",
...          "refs": {"p":"q", "x": ["abc","xyz"] }
...        }
... ]}
... 
... 
... { "log": [
...        {"code": "abc",
...          "refs": {"a":"b"}
...        }
... ]}
... """
>>> items = json.loads('[' + re.sub('}\s*{', '},\n{', s, flags=re.M) + ']')
>>> items[0]
{'log': [{'code': 'abc', 'refs': {'a': 'b'}}]}
>>> items[1]
{'log': [{'code': 'xyz', 'refs': {'p': 'q', 'x': ['abc', 'xyz']}}]}
>>> items[2]['log'][0]['code']
'abc'

关键是re.sub('}\s*{', '},\n{', s, flags=re.M)，表达式 ( '}\s*{') 的作用是找到所有}和{仅由空格分隔的情况（或根本没有）。另一位是flags关键字参数，以确保跨多行检查替换，否则表达式将像原始示例一样跨换行匹配。

python - 使用python对文件中的给定模式执行正则表达式操作

问题描述

解决方案

推荐阅读