首页 > 解决方案 > Pandas:稀疏数据框到没有 nan 值的字典

问题描述

sdf我有一个主要包含在其中的大型稀疏数据框NaN。当我使用sdf.to_dict()它时,它会输出该矩阵的密集版本,其中所有null值都已填充。我怎么能省略这些NaN条目,而只有输出条目才对字典有价值?

例如,sdf是:

          2018-02-02  2018-02-03
23:58:36         NaN         NaN
23:58:37         1.0         NaN
23:58:40         NaN         NaN
23:58:41         NaN         NaN
23:58:42         NaN         NaN
23:58:43         NaN         NaN
23:58:48         NaN         NaN
23:58:49         NaN         NaN
23:58:50         NaN         NaN
23:58:52         NaN         1.0
23:58:59         NaN         NaN
23:59:00         NaN         NaN
23:59:01         NaN         NaN
23:59:05         NaN         NaN
23:59:07         NaN         NaN

stf.to_dict()会给:

{'2018-02-02': {'23:58:36': nan, '23:58:37': 1.0, '23:58:40':
  nan, '23:58:41': nan, '23:58:42': nan, '23:58:43': nan,
  '23:58:48': nan, '23:58:49': nan, '23:58:50': nan, '23:58:52':
  nan, '23:58:59': nan, '23:59:00': nan, '23:59:01': nan,
  '23:59:05': nan, '23:59:07': nan}, '2018-02-03': {'23:58:36':
  nan, '23:58:37': nan, '23:58:40': nan, '23:58:41': nan,
  '23:58:42': nan, '23:58:43': nan, '23:58:48': nan, '23:58:49':
  nan, '23:58:50': nan, '23:58:52': 1.0, '23:58:59': nan,
  '23:59:00': nan, '23:59:01': nan, '23:59:05': nan, '23:59:07':
  nan}}

Evensdf是一个稀疏的数据框。


很抱歉模棱两可。我想保留所有非NaN条目。所需的输出是

{'2018-02-02': {'23:58:37': 1.0}, '2018-02-03': {'23:58:52': 1.0}}

标签: pythonpandas

解决方案


stack与 一起使用dict comprehension

from collections import defaultdict
d = defaultdict(dict)
for (k1, k2), v in df.stack().items():
    d[k2][k1] = v

d1 = dict(d)

如果输入SeriesDatetimeIndex

print (s)
2018-02-02 23:58:37    1.0
2018-02-03 23:58:52    1.0
dtype: float64

from collections import defaultdict
d = defaultdict(dict)
for k, v in df.stack().items():
    d[k.strftime('%Y-%m-%d')][k.strftime('%H:%M:%S')] = v

d1 = dict(d)

推荐阅读