首页 > 解决方案 > 字符串操作从字符串 python 中提取精确模式

问题描述

我有一个包含数据的文本文件,如下所示,数据文件包含大量条目

{latlng: [77.7355421, 12.985924] , name: 'International Tech Park Bangalore 2.5 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.7515038, 12.9829723] , name: 'H M Tech Park 2.3 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.721544,12.981423], name: 'Prestige Featherlite Techapark 4.7km', type: 'work', icon: 'suitcase'},
{latlng: [77.7434198, 12.9852629] , name: 'GR Tech Park 1.6 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.7402427, 12.9860243] , name: 'Brigade Tech Park 1.8 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.7257197, 12.9761888] , name: 'Concentrix Embassy TI  3.7 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.72577199999999, 12.9821986] , name: 'Gopalan Global Axis 3.4 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.7445732, 12.9581743] , name: 'Sigma Soft Tech Park Gamma Block 4.0 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.6911745, 12.9425392] , name: 'Prestige Tech Park II 10.4 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.710490,12.973138], name: 'Prestige Technostar 6.7 Km', type: 'work', icon: 'suitcase'},
{latlng: [77.7180171, 12.9709603] , name: 'Kalyani Tech Park Private Limited 4.7 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.7053282, 12.9935253] , name: 'Bhoruka Technology Park 7.9 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.71993979999999, 12.9821174] , name: 'Mind Comp Tech Park 4.0 km' , type: 'work' , icon: 'suitcase'},
{latlng: [77.7419192, 12.9759885] , name: 'The Deens Academy 75 m' , type: 'learn' , icon: 'student'},
{latlng: [77.7434897, 12.9758206] , name: 'Deens academy parking 0.1 km' , type: 'learn' , icon: 'student'},
{latlng: [77.7424005, 12.976853] , name: 'Zeena English School 0.1 km' , type: 'learn' , icon: 'student'},
{latlng: [77.7451549, 12.9747377] , name: 'Indus Early Learning Centre Whitefield 0.6 km' , type: 'learn' , icon: 'student'},
{latlng: [77.739758, 12.976582] , name: 'Mont Ivy Preschools 0.3 km' , type: 'learn' , icon: 'student'},
{latlng: [77.7458612, 12.9743853] , name: 'Miracle kids Pre School 0.5 km' , type: 'learn' , icon: 'student'},
{latlng: [77.7482531, 12.9741599] , name: 'Little Millennium 0.9 km' , type: 'learn' , icon: 'student'},
{latlng: [77.7481791, 12.9797439] , name: 'Holy Cross School and PU College 2.2 km' , type: 'learn' , icon: 'student'},
{latlng: [77.74956259999999, 12.9770804] , name: 'St. Joseph's Convent School 1.3 km' , type: 'learn' , icon: 'student'},
{latlng: [77.74558390000001, 12.9691268] , name: 'The Foundation School 1.7 km' , type: 'learn' , icon: 'student'},

我想从name字段中的每一行中提取距离,并将其分隔为一个新的字段distance。例如,上面的最后一行:

{latlng: [77.74558390000001, 12.9691268] , name: 'The Foundation School 1.7 km' , type: 'learn' , icon: 'student'}

应转换为:

{latlng: [77.74558390000001, 12.9691268] ,名称:“基础学校”,距离:1.7,类型:“学习”,图标:“学生”}

我试图转换这个 intjson但它不是正确的格式json,如下所示。我尝试的另一件事是将此字符串转换为列表然后提取,但我不确定这是正确的方法吗?

import json
import ast
import re


with open('data.txt', 'r') as f:
    for i, line in enumerate(f, start=1):
        f = line.split()

        for i in f:
            print(type(i))

将不胜感激一些帮助

标签: python

解决方案


正如Rmano所说,数据需要重新格式化和统一,否则需要检查的测试用例很多。

改进Rmano答案以获得所需的输出,我们得到以下结果,其中t.txt是您在问题中提供的输入:

def extract_name_substring(line):
    pre_define_pattern = r" name: '(.*) ([0-9.]+ *[kK ]m)'"
    name_dist_str = re.findall(pre_define_pattern, line)
    return "name: '" + name_dist_str[0][0] + "' , distance: " + name_dist_str[0][1]

lines = open('t.txt').read().splitlines()
new_strings = []
for line in lines:
    name_dist_str = extract_name_substring(line)
    tmp = line.split(' , ')
    new_str = tmp[0] + " , " + name_dist_str + " , " +" , ".join(tmp[2:])
    new_strings.append(new_str)

print(new_strings)

结果输出为:

["{latlng: [77.7355421, 12.985924] , name: 'International Tech Park Bangalore' , distance: 2.5 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.7515038, 12.9829723] , name: 'H M Tech Park' , distance: 2.3 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.721544,12.981423], name: 'Prestige Featherlite Techapark 4.7km', type: 'work', icon: 'suitcase'}, , name: 'Prestige Featherlite Techapark' , distance: 4.7km , ", "{latlng: [77.7434198, 12.9852629] , name: 'GR Tech Park' , distance: 1.6 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.7402427, 12.9860243] , name: 'Brigade Tech Park' , distance: 1.8 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.7257197, 12.9761888] , name: 'Concentrix Embassy TI ' , distance: 3.7 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.72577199999999, 12.9821986] , name: 'Gopalan Global Axis' , distance: 3.4 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.7445732, 12.9581743] , name: 'Sigma Soft Tech Park Gamma Block' , distance: 4.0 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.6911745, 12.9425392] , name: 'Prestige Tech Park II' , distance: 10.4 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.710490,12.973138], name: 'Prestige Technostar 6.7 Km', type: 'work', icon: 'suitcase'}, , name: 'Prestige Technostar' , distance: 6.7 Km , ", "{latlng: [77.7180171, 12.9709603] , name: 'Kalyani Tech Park Private Limited' , distance: 4.7 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.7053282, 12.9935253] , name: 'Bhoruka Technology Park' , distance: 7.9 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.71993979999999, 12.9821174] , name: 'Mind Comp Tech Park' , distance: 4.0 km , type: 'work' , icon: 'suitcase'},", "{latlng: [77.7419192, 12.9759885] , name: 'The Deens Academy' , distance: 75 m , type: 'learn' , icon: 'student'},", "{latlng: [77.7434897, 12.9758206] , name: 'Deens academy parking' , distance: 0.1 km , type: 'learn' , icon: 'student'},", "{latlng: [77.7424005, 12.976853] , name: 'Zeena English School' , distance: 0.1 km , type: 'learn' , icon: 'student'},", "{latlng: [77.7451549, 12.9747377] , name: 'Indus Early Learning Centre Whitefield' , distance: 0.6 km , type: 'learn' , icon: 'student'},", "{latlng: [77.739758, 12.976582] , name: 'Mont Ivy Preschools' , distance: 0.3 km , type: 'learn' , icon: 'student'},", "{latlng: [77.7458612, 12.9743853] , name: 'Miracle kids Pre School' , distance: 0.5 km , type: 'learn' , icon: 'student'},", "{latlng: [77.7482531, 12.9741599] , name: 'Little Millennium' , distance: 0.9 km , type: 'learn' , icon: 'student'},", "{latlng: [77.7481791, 12.9797439] , name: 'Holy Cross School and PU College' , distance: 2.2 km , type: 'learn' , icon: 'student'},", "{latlng: [77.74956259999999, 12.9770804] , name: 'St. Joseph's Convent School' , distance: 1.3 km , type: 'learn' , icon: 'student'},", "{latlng: [77.74558390000001, 12.9691268] , name: 'The Foundation School' , distance: 1.7 km , type: 'learn' , icon: 'student'},"]

推荐阅读