首页 > 解决方案 > 如何标记正则表达式模式并对结果列表进行排序?

问题描述

我有一个看起来像这样的文件:

select a,b,c FROM Xtable
select a,b,c FROM Vtable
select a,b,c FROM Atable
select a,b,c FROM Atable
select d,e,f FROM Atable

我想要一个 sortedMap:

{
"Atable":["select a,b,c FROM Atable", "select d,e,f FROM Atable"],
"Vtable":["select a,b,c FROM Vtable"],
"Xtable":["select a,b,c FROM Xtable"]
}

的键是sortedMap表名,值是列表中的文本行。

我从这个开始,但坚持标记正则表达式匹配的行:

import re

f = open('mytext.txt', 'r')
x = f.readlines()
print x
f.close()
for i in x:
    p = re.search(".* FROM ", i)
 //now how to tokenize and get the value that follows FROM

标签: pythonregex

解决方案


我们很可能不想使用正则表达式来完成这项任务,但如果我们这样做了,我们可以从一个简单的表达式开始,可能类似于:

\"(.+?([a-z]+))\"

我们将其替换为"\2":["\1"],,然后我们将添加一个{}

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"\"(.+?([a-z]+))\""

test_str = ("\"select a,b,c FROM Xtable\"\n"
    "\"select a,b,c FROM Vtable\"\n"
    "\"select a,b,c FROM Atable\"\n"
    "\"select a,b,c FROM Atable\"\n"
    "\"select d,e,f FROM Atable\"")

subst = "\"\\2\":[\"\\1\"],"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

正则表达式

如果不需要此表达式,可以在regex101.com中对其进行修改/更改。

正则表达式电路

jex.im可视化正则表达式:

在此处输入图像描述


推荐阅读