首页 > 解决方案 > 从堆叠的 pandas 数据框中获取 JSON

问题描述

我有下面的示例数据框:

d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'], 
     'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
     'text': ["hello", "i", "am", "a", "piece", "of", "text", "have", "a", "nice", "day", "friends"],
}

}
df = pd.DataFrame(data=d)
df   

输出:

    key count   text
0   foo    12   hello
1   foo     3   i
2   foo     5   am
3   foo     5   a
4   bar     3   piece
5   bar     1   of
6   bar     4   text
7   bar     1   have
8   crow    7   a
9   crow    3   nice
10  crow    8   day
11  crow    2   friends

我将数据框堆叠在一起: df.set_index("key").stack()

要得到:

key        
foo   count         12
      text       hello
      count          3
      text           i
      count          5
      text          am
      count          5
      text           a
bar   count          3
      text       piece
      count          1
      text          of
      count          4
      text        text
      count          1
      text        have
crow  count          7
      text           a
      count          3
      text        nice
      count          8
      text         day
      count          2
      text     friends
dtype: object

我现在正在尝试将堆叠的 df 输出为 JSON 文件,但是当我使用 时to_json(),出现错误:

ValueError: Series index must be unique for orient='index'

预期输出将按以下text方式count分组key

[
  {
    "key": "19",
    "values": [
        {
            text: 'hello',
            count: 12
        },
        {
            content: 'i',
            count: 3
        },
        {
            content: 'am',
            count: 5
        },
        ...
    ]
]

标签: pythonjsonpandasdataframe

解决方案


如评论中所述,您的预期输出不是有效的 JSON 字符串。您需要"some_key":[...]"key":"bar".

例如groupby

json_str = json.dumps([ {'key':k, 'values':d.to_dict('records')}
                       for k,d in df.drop('key',axis=1).groupby(df['key'])
                      ], indent=2)

输出:

[
  {
    "key": "bar",
    "values": [
      {
        "count": 3,
        "text": "piece"
      },
      {
        "count": 1,
        "text": "of"
      },
      {
        "count": 4,
        "text": "text"
      },
      {
        "count": 1,
        "text": "have"
      }
    ]
  },
  {
    "key": "crow",
    "values": [
      {
        "count": 7,
        "text": "a"
      },
      {
        "count": 3,
        "text": "nice"
      },
      {
        "count": 8,
        "text": "day"
      },
      {
        "count": 2,
        "text": "friends"
      }
    ]
  },
  {
    "key": "foo",
    "values": [
      {
        "count": 12,
        "text": "hello"
      },
      {
        "count": 3,
        "text": "i"
      },
      {
        "count": 5,
        "text": "am"
      },
      {
        "count": 5,
        "text": "a"
      }
    ]
  }
]

推荐阅读