首页 > 解决方案 > 将包含项目和子项目的文本文件转换为字典或数据结构

问题描述

我有一个包含以下项目和功能的文本文件

Item name: Item1  
    Feature 1: 64.264  
    Feature 2: 18.071  
    Feature 3: 188.516  
    Feature 4: 0.208  
    Feature 5: 4.711  
    Feature 6: 0.412  
    Feature 7: -14.902  
    Feature 9: -10.435  
    Feature 10: 18.089    
Item name: Item2  
    Feature 1: 69.990  
    Feature 2: 19.312  
    Feature 3: 117.832  
    Feature 4: 0.419  
    Feature 5: 5.224  
    Feature 6: 0.458  
    Feature 7: -20.500  
    Feature 8: -12.933  
    Feature 9: 15.646  
    Feature 10: 1.751  
Item name: Item3  
    Feature 1: 66.125  
    Feature 2: 23.067  
    Feature 3: 133.110  
    Feature 4: 0.328  
    Feature 5: 2.854  
    Feature 6: 0.249  
    Feature 7: -37.271  
    Feature 8: -10.310  
    Feature 9: 13.784  
    Feature 10: 3.067

我想使用 Python 将这个文本文件更改为一个数据结构,其中项目名称为第 0 列,特征 1 到特征 10 作为第 1 到 10 列。我将不胜感激。

标签: pythonpandas

解决方案


我会使用 python 来生成字典:

In [11]: a = {}

In [12]: for line in open('file.txt'):
    ...:     if line.startswith(" "):
    ...:         k, v = line.split(':')
    ...:         a[current][k.strip()] = v.strip()
    ...:     else:
    ...:         current = line.split(':')[1].strip()
    ...:         a[current] = {}
    ...:

In [13]: pd.DataFrame.from_dict(a)
Out[13]:
              Item1    Item2    Item3
Feature 1    64.264   69.990   66.125
Feature 10   18.089    1.751    3.067
Feature 2    18.071   19.312   23.067
Feature 3   188.516  117.832  133.110
Feature 4     0.208    0.419    0.328
Feature 5     4.711    5.224    2.854
Feature 6     0.412    0.458    0.249
Feature 7   -14.902  -20.500  -37.271
Feature 8       NaN  -12.933  -10.310
Feature 9   -10.435   15.646   13.784

In [14]: pd.DataFrame.from_dict(a, orient='index')
Out[14]:
      Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7 Feature 9 Feature 10 Feature 8
Item1    64.264    18.071   188.516     0.208     4.711     0.412   -14.902   -10.435     18.089       NaN
Item2    69.990    19.312   117.832     0.419     5.224     0.458   -20.500    15.646      1.751   -12.933
Item3    66.125    23.067   133.110     0.328     2.854     0.249   -37.271    13.784      3.067   -10.310

推荐阅读