首页 > 解决方案 > 列表列表中的非规范化层次结构

问题描述

我正在解析一个文件,其中标签定义如下,层次结构使用新行表示

+--------------------+--------------------+--------------------+
| L1 - A             |                    |                    |
|                    |  L2 - B            |                    |
|                    |                    |  L3 - C            |
|                    |                    |                    |
| L1 - D             |                    |                    |
|                    |  L2 - E            |                    |
|                    |                    |  L3 - F            |
+--------------------+--------------------+--------------------+

我将上述表示为:

labels = [
   ['A', None, None, None, 'D', None, None],
   [None, 'B', None, None, None, 'E', None],
   [None, None, 'C', None, None, None, 'F']
]

我试过了

def joinfoo(items):
   if len(items) == 1:
      return items[0]

   result = []
   active = None
   for x, y in zip(items[0], joinfoo(items[1:])):
      active = x if x else active
      if type(y) is tuple:
         result.append((active, y[0], y[1]))
      else:
         result.append((active, y))

   return result

我想了

[
   ('A', None, None), ('A', 'B', None), ('A', 'B', 'C'),
   (None, None, None),
   ('D', None, None), ('D', 'E', None), ('D', 'E', 'F')
]

得到了这个

[
   ('A', None, None), ('A', 'B', None), ('A', 'B', 'C'),
   ('A', 'B', None),
   ('D', 'B', None), ('D', 'E', None), ('D', 'E', 'F')
]

关于如何修复的建议joinfoo()以达到预期结果的建议?解决方案需要支持可变数量的列。

它应该类似于for x, y in zip(joinfoo(items[:-1]), items[-1]):而不是for x, y in zip(items[0], joinfoo(items[1:])):朝着正确的方向前进......?

编辑:列表的原始列表可能错误地暗示了层次结构的模式。没有定义的模式。列数也是可变的。一个更好的测试用例也许..

+--------------+--------------+--------------+
|   L1 - A     |              |              |    = A
|              |    L2 - B    |              |    = A - B
|              |              |    L3 - C    |    = A - B - C
|              |              |    L3 - D    |    = A - B - D
|              |    L2 - E    |              |    = A - E
|              |              |              |    =   
|   L1 - F     |              |              |    = F
|              |    L2 - G    |              |    = F - G
|              |              |    L3 - H    |    = F - G - H
+--------------+--------------+--------------+

labels = [
   ['A', None, None, None, None, None, 'F', None, None],
   [None, 'B', None, None, 'E', None, None, 'G', None],
   [None, None, 'C', 'D', None, None, None, None, 'H']
]

标签: pythonpython-2.7

解决方案


有一些时间在我手边,想知道我将如何解决这个问题。

所以这是我的解决方案,也许它会激发一些想法:

labels = """\
+--------------------+--------------------+--------------------+
| L1 - A             |                    |                    |
|                    |  L2 - B            |                    |
|                    |                    |  L3 - C            |
|                    |                    |                    |
| L1 - D             |                    |                    |
|                    |  L2 - E            |                    |
|                    |                    |  L3 - F            |
+--------------------+--------------------+--------------------+
"""

lines = [[(s.strip()[-1:] if s.strip() else None)
             for s in line[1:-1].split('|')]
                 for line in labels.splitlines()[1:-1]]

for index, labels in enumerate(lines):
    if not any(labels):
        continue
    for i, label in enumerate(labels):
        if label:
            break
        if not label:
            lines[index][i] = lines[index-1][i]

print([tuple(labels) for labels in lines])

# --> [('A', None, None), ('A', 'B', None), ('A', 'B', 'C'), (None, None, None), ('D', None, None), ('D', 'E', None), ('D', 'E', 'F')]

推荐阅读