首页 > 解决方案 > 从字符串创建嵌套列表

问题描述

这是一系列区域位置及其在新加坡的各个子区域。

Bishan[1]
Bishan East
Marymount
Upper Thomson
Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)
Alexandra Hill
Alexandra North
Bukit Ho Swee
Bukit Merah (Not to be confused with Bukit Merah planning area.)
City Terminals (Formerly called "Tanjong Pagar" subzone.)
Depot Road
Everton Park
Henderson Hill
Kampong Tiong Bahru
Maritime Square (Formerly called "HarbourFront" subzone.)
Redhill
Singapore General Hospital
Telok Blangah Drive
Telok Blangah Rise
Telok Blangah Way
Tiong Bahru
Tiong Bahru Station
Bukit Timah[3]
Anak Bukit
Coronation Road
Farrer Court
Hillcrest
Holland Road
Leedon Park
Swiss Club
Ulu Pandan
Downtown Core[4]
Anson
Bayfront
Bugis
Cecil
Central
City Hall
Clifford Pier
Marina Centre
Maxwell
Phillip
Raffles Place
Tanjong Pagar
Geylang[5]
Aljunied
Geylang East
Kallang Way
MacPherson
Kampong Ubi
Kallang[6]
Bendemeer
Boon Keng
Crawford
Geylang Bahru
Kallang Bahru
Kampong Bugis
Kampong Java
Lavender
Tanjong Rhu

或者,作为 Python 字符串:

data = 'Bishan[1]\nBishan East\nMarymount\nUpper Thomson\nBukit Merah[2] (Not to be confused with Bukit Merah subzone.)\nAlexandra Hill\nAlexandra North\nBukit Ho Swee\nBukit Merah (Not to be confused with Bukit Merah planning area.)\nCity Terminals (Formerly called "Tanjong Pagar" subzone.)\nDepot Road\nEverton Park\nHenderson Hill\nKampong Tiong Bahru\nMaritime Square (Formerly called "HarbourFront" subzone.)\nRedhill\nSingapore General Hospital\nTelok Blangah Drive\nTelok Blangah Rise\nTelok Blangah Way\nTiong Bahru\nTiong Bahru Station\nBukit Timah[3]\nAnak Bukit\nCoronation Road\nFarrer Court\nHillcrest\nHolland Road\nLeedon Park\nSwiss Club\nUlu Pandan\nDowntown Core[4]\nAnson\nBayfront\nBugis\nCecil\nCentral\nCity Hall\nClifford Pier\nMarina Centre\nMaxwell\nPhillip\nRaffles Place\nTanjong Pagar\nGeylang[5]\nAljunied\nGeylang East\nKallang Way\nMacPherson\nKampong Ubi\nKallang[6]\nBendemeer\nBoon Keng\nCrawford\nGeylang Bahru\nKallang Bahru\nKampong Bugis\nKampong Java\nLavender\nTanjong Rhu\n'

单词 withsquare brackets[]是后面跟着由换行符分隔的子区域的区域\n。我想要做的是创建一个区域列表,其中包含一个子区域的子列表,如下所示(稍后我将要删除方括号和括号及其内容):

1.)碧山[1]

- Bishan East
- Marymount
- Upper Thomson

2.) Bukit Merah[2](不要与 Bukit Merah 分区混淆。)

- Alexandra Hill
- Alexandra North
- Bukit Ho Swee
- Bukit Merah (Not to be confused with Bukit Merah planning area.)
- City Terminals (Formerly called "Tanjong Pagar" subzone.)

...

到目前为止,我只能使用 split() 和正则表达式提取区域。

zones_and_subzones = data.split('\n')
zones = [zone for zone in zones_and_subzones if re.match(r'(.*?)\[', zone)]

这就是我所困的地方,我在尝试提取每个区域的子区域时遇到了麻烦。我尝试使用

regex = (\].*?\[)

提取右方括号和左方括号之间的文本,但其结果不完整。我已经有一段时间了,希望能得到帮助。如果有比我目前拥有的更好的方法,请分享。谢谢你。

标签: pythonregexlistnested-lists

解决方案


更建议在这种情况下使用字典,特别是我会使用默认字典来更快地实现:

from collections import defaultdict 
dicti = defaultdict(lambda:[])
for word in str_data.split('\n'):
    if '[' in word and ']' in word:
        name = word
    else:
        dicti[name].append(word) # or alternatively -> `dicti[name] += [word]`
>>>dicti
{'Bishan[1]': ['Bishan East', 'Marymount', 'Upper Thomson'],
             'Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)': ['Alexandra Hill',
              'Alexandra North',
              'Bukit Ho Swee',
              'Bukit Merah (Not to be confused with Bukit Merah planning area.)',
              'City Terminals (Formerly called "Tanjong Pagar" subzone.)',
              'Depot Road',
              'Everton Park',
              'Henderson Hill',
              'Kampong Tiong Bahru',
              'Maritime Square (Formerly called "HarbourFront" subzone.)',
              'Redhill',
              'Singapore General Hospital',
              'Telok Blangah Drive',
              'Telok Blangah Rise',
              'Telok Blangah Way',
              'Tiong Bahru',
              'Tiong Bahru Station'],
   #...
})

推荐阅读