首页 > 解决方案 > 如何将作为列表列表的值的字典转换为python中的数据框?

问题描述

我有一个这样的字典。键作为“起始职位”,值作为条目列表,每个条目包含多个其他值。

dict1 = {28878779: 
[[0.63078648931418,'BRCA','Primary Blood Derived Cancer','chr16'],
  [0.913319324289701, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.4291909025802871, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.7571498628201009, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.20053355013001398, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.47222708511173905, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.5421979810611359, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.517080694962231, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.354578922865826, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.47933127476003706, 'BRCA', 'Primary Blood Derived Cancer', 'chr16']]
116276795: 
[[0.0295335249313507,'BRCA','Primary Blood Derived Cancer','chr12'],
  [0.0225709542480921, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0230930552162406, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0226794373583645, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0465238706721383, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0308525159082739, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0280263565564701, 'BRCA', 'Primary Blood Derived Cancer', 'chr12']]
...}

我想将字典转换成这样的数据框。一个数据框,其中包含字典的键和值(值的每个条目)到数据框的行中。

Start       Beta_value       Cancer            Stage             Chromosome
28878779  0.63078648931418   BRCA  Primary Blood Derived Cancer    chr16
28878779  0.913319324289701  BRCA  Primary Blood Derived Cancer    chr16
.
.
116276795 0.029533524931350  BRCA  Primary Blood Derived Cancer    chr12
116276795 0.0225709542480921 BRCA  Primary Blood Derived Cancer    chr12
.
.

我试过这个..

dlist = [[key,value[i][0],value[i][1],value[i][2],value[i][3]]
for key,value in dict1.items()
for i in value]


beta = pd.DataFrame(d, columns = 
['Start','Beta_value','Cancer','Stage','Chromosome'])

它显示了一些类型错误:

   TypeError: list indices must be integers or slices, not list

我应该做些什么?

标签: pythonlistdictionarydataframe

解决方案


可变i返回列表,因此需要对它们进行索引:

dlist = [[key,i[0],i[1],i[2],i[3]] for key,value in dict1.items() for i in value]

或将密钥添加到列表中:

dlist = [[key] + i for key,value in dict1.items() for i in value] 
#alternative 
#dlist = [(key, *i) for key,value in dict1.items() for i in value]    

beta = pd.DataFrame(dlist, columns=['Start','Beta_value','Cancer','Stage','Chromosome'])
print (beta)
        Start  Beta_value Cancer                         Stage Chromosome
0    28878779    0.630786   BRCA  Primary Blood Derived Cancer      chr16
1    28878779    0.913319   BRCA  Primary Blood Derived Cancer      chr16
2    28878779    0.429191   BRCA  Primary Blood Derived Cancer      chr16
3    28878779    0.757150   BRCA  Primary Blood Derived Cancer      chr16
4    28878779    0.200534   BRCA  Primary Blood Derived Cancer      chr16
5    28878779    0.472227   BRCA  Primary Blood Derived Cancer      chr16
6    28878779    0.542198   BRCA  Primary Blood Derived Cancer      chr16
7    28878779    0.517081   BRCA  Primary Blood Derived Cancer      chr16
8    28878779    0.354579   BRCA  Primary Blood Derived Cancer      chr16
9    28878779    0.479331   BRCA  Primary Blood Derived Cancer      chr16
10  116276795    0.029534   BRCA  Primary Blood Derived Cancer      chr12
11  116276795    0.022571   BRCA  Primary Blood Derived Cancer      chr12
12  116276795    0.023093   BRCA  Primary Blood Derived Cancer      chr12
13  116276795    0.022679   BRCA  Primary Blood Derived Cancer      chr12
14  116276795    0.046524   BRCA  Primary Blood Derived Cancer      chr12
15  116276795    0.030853   BRCA  Primary Blood Derived Cancer      chr12
16  116276795    0.028026   BRCA  Primary Blood Derived Cancer      chr12

推荐阅读