python - re.findall 在 Python 中生成类似列表的数组
问题描述
我正在编写一个 python 脚本来从文本文件中提取几个特征。
输入文件具有以下结构:
ENTRY M00001 Pathway Module
NAME Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
CLASS Pathway modules; Carbohydrate metabolism; Central carbohydrate metabolism
PATHWAY map00010 Glycolysis / Gluconeogenesis
map01200 Carbon metabolism
map01100 Metabolic pathways
///
我正在从“ENTRY”字段和“PATHWAY”字段中提取值。但是,当我将内容写入 PostgreSQL 11.0 表时,我得到以下输出。列类型是“字符变化”
id map_id
{M00001} {map00010,map01200,map01100}
{M00002} {map00010,map01200,map01230,map01100}
{M00003} {map00010,map00020,map01100}
{M00004} {map00030,map01200,map01100,map01120}
{M00005} {map00030,map00230,map01200,map01230,map01100}
下面的代码生成上面的输出:
cursor = conn.cursor()
dict = {}
with open ('file') as f:
for line in f:
if(re.search("^[A-Z]", line) ):
key, value = re.split("\s+", line, 1)
dict[key] = value
elif(re.search("^\s+", line)):
dict[key] = dict[key] + line
elif(re.search("^///", line)):
e = dict['ENTRY']
string = ''.join(e)
id = re.findall(r"(^[A-Za-z+]\d+)", string)
map_id = re.findall(r"(map\d+)\s+.*",dict['PATHWAY'])
cursor.execute("INSERT INTO tbl (id, map_id) VALUES (%s, %s)",(id, map_id))
conn.commit()
conn.close()
cursor.close()
预期的输出是:
id map_id
M00001 map00010,map01200,map01100
M00002 map00010,map01200,map01230,map01100
M00003 map00010,map00020,map01100
M00004 map00030,map01200,map01100,map01120
M00005 map00030,map00230,map01200,map01230,map01100
非常感谢任何帮助