首页 > 解决方案 > Python:如何在不创建过度列表的情况下将字符串转换为列表

问题描述

我收到的数据格式很糟糕,由一系列带有后续属性的点组成。每个点都以方括号为界,[]但目前是字符串类型。

我已经尝试过list()转换字符串的标准方法,但是这会将由许多点组成的整个字符串放入一个列表中。我想使用字符串中现有的方括号被识别为列表,而不是创建一个包含一个项目的总体列表。

字符串类型的数据如下所示。以下只是一组点,我有数百个要迭代;开头和结尾的双方括号表示一个组。

[[451166.32,719761.36,20.37,0.06,],[451162.97,719765.06,20.41,0.048,1],[451161.63,719766.54,10.17,0.048,],[451158.26,719770.23,20.44,0.048,],[451156.19,719772.54,20.05,0.048,0],[451148.7,719780.68,-10.77,0.048,],[451138.57,719791.95,-10.2,0.048,],[451129.33,719802.15,-10.38,0.048,],[451118.07,719814.56,10.06,0.048,],[451105.98,719827.91,-10.64,0.048,],[451095.10,719839.91,-10.47,0.048,],[451087.17,719848.66,-10.72,0.048,],[451082.94,719853.31,10.92,0.048,0],[451078.,719858.77,2.75,10.048,],[451076.79,719860.10,5.2,10.06,1]]

我已经尝试过list(xsData.split(","))[i.strip("[],").split(",") for i in myList]以及其他几种方法,但都将字符串放入一个总体列表中,或者将每个字符放入它自己的列表中。

最终目标是能够遍历每个列表中的每个项目,以便将数据写入更友好的格式,例如 TXT/CSV。

编辑:ast.literal.eval()适用于除以下组之外的所有点组,引发invalid syntax错误。我看不出原因。 [[455972.1700000000128057,786651.7399999999906868,44.4499999999999993,0.045,],[455976.5700000000069849,786652.7800000000279397,10.2899999999999991,4.04,1],[455977.7000000000116415,786653.0500000000465661,12.8300000000000001,1.04,],[455979.0499999999883585,786653.3699999999953434,2.8800000000000008,0.04,],[455979.6900000000023283,786653.5200000000186265,3.4199999999999999,5.04,],[455983.9299999999930151,786654.5200000000186265,9.75,0.04,],[455990.8900000000139698,786656.1700000000419095,0.8499999999999996,0.04,],[455993.5100000000093132,786656.7900000000372529,0.4100000000000001,0.04,],[455993.7900000000081491,786656.8499999999767169,0.3300000000000001,0.04,],[455994.8699999999953434,786657.1099999999860302,4.5199999999999996,0.04,],[455997.0499999999883585,786657.6300000000046566,4.6100000000000003,0.04,],[455997.5899999999965075,786657.75,4.8600000000000003,0.04,],[455998.7099999999918509,786658.0200000000186265,1.0099999999999998,0.045,1],[456000.3200000000069849,786658.4000000000232831,1.3699999999999992,0.045,],[456002.2799999999988358,786658.8599999999860302,17.6400000000000006,0.045,],[456006.2900000000081491,786659.8100000000558794,14.8100000000000005,0.045,],[456009.5899999999965075,786660.5899999999674037,10.4399999999999995,,],[456017.0,786662.3499999999767169,19.1099999999999994,,]]

标签: pythonstringlisttype-conversion

解决方案


如果字符串看起来像一个语法上有效的 Python 列表,那么您可以通过调用获取该列表数据ast.literal_eval

>>> import ast
>>> s = "[[451166.32,719761.36,20.37,0.06,],[451162.97,719765.06,20.41,0.048,1],[451161.63,719766.54,10.17,0.048,],[451158.26,719770.23,20.44,0.048,],[451156.19,719772.54,20.05,0.048,0],[451148.7,719780.68,-10.77,0.048,],[451138.57,719791.95,-10.2,0.048,],[451129.33,719802.15,-10.38,0.048,],[451118.07,719814.56,10.06,0.048,],[451105.98,719827.91,-10.64,0.048,],[451095.10,719839.91,-10.47,0.048,],[451087.17,719848.66,-10.72,0.048,],[451082.94,719853.31,10.92,0.048,0],[451078.,719858.77,2.75,10.048,],[451076.79,719860.10,5.2,10.06,1]]"
>>> x = ast.literal_eval(s)
>>> type(x)
<class 'list'>
>>> x
[[451166.32, 719761.36, 20.37, 0.06], [451162.97, 719765.06, 20.41, 0.048, 1], [451161.63, 719766.54, 10.17, 0.048], [451158.26, 719770.23, 20.44, 0.048], [451156.19, 719772.54, 20.05, 0.048, 0], [451148.7, 719780.68, -10.77, 0.048], [451138.57, 719791.95, -10.2, 0.048], [451129.33, 719802.15, -10.38, 0.048], [451118.07, 719814.56, 10.06, 0.048], [451105.98, 719827.91, -10.64, 0.048], [451095.1, 719839.91, -10.47, 0.048], [451087.17, 719848.66, -10.72, 0.048], [451082.94, 719853.31, 10.92, 0.048, 0], [451078.0, 719858.77, 2.75, 10.048], [451076.79, 719860.1, 5.2, 10.06, 1]]

我不完全确定,但听起来你的字符串实际上可能看起来像多个列表连接在一起,在这种情况下你不能只调用literal_eval它:

>>> import ast
>>> s = "[1,2][3,4,[[5,6],7]][8,9]"
>>> ast.literal_eval(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programming\Python 3.6\lib\ast.py", line 85, in literal_eval
    return _convert(node_or_string)
  File "C:\Programming\Python 3.6\lib\ast.py", line 84, in _convert
    raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Subscript object at 0x02E25650>

如果是这种情况,您可以将数据分成单独的组,以便您可以独立评估它们。

import ast

def separate_groups(s):
    """finds matching square brackets within `s` and yields successive portions that resemble valid list literals.
    note: may not operate correctly on data that contains quoted brackets, for example `"[1, '[', 2][3,4]"`
    """
    depth = 0
    last_seen_group_end = -1
    for i,c in enumerate(s):
        if c == "[":
            depth += 1
        elif c == "]":
            depth -= 1
            if depth == 0:
                yield s[last_seen_group_end+1: i+1]
                last_seen_group_end = i

s = "[1,2][3,4,[[5,6],7]][8,9]"
result = [ast.literal_eval(group) for group in separate_groups(s)]
print(result)

结果:

[[1, 2], [3, 4, [[5, 6], 7]], [8, 9]]

推荐阅读