首页 > 解决方案 > 将 JSON 转换为 pandas DataFrame- Python(来自 yahoo_financials 的 JSON)

问题描述

任何人都可以帮助我使用该 JSON 格式:(更新的数据框)

JSON:

{'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025},  {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025},  {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025},  {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025},  {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033},  {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033},  {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033},  {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033},  {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200,   'formatted_date': '2019-06-03',   'amount': 0.3},  {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2},  {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4},  {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1},  {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}

所以我从

yahoo_financials.get_daily_dividend_data('2019-1-1', '2020-12-1')

举个例子。

尝试通过以下方式将其转换为 DataFrame:


    data2 = {"data": {'VIG.VI': [{'date'......................................
    s=pd.DataFrame(data2)
    pd.concat([s.drop('data',1),pd.DataFrame(s.data.tolist(),index=s.index)],1)

在这种情况下,我得到如下结果:0 [{'date': 1433314500, 'formatted_date': '2015-... [{'date': 1430290500, 'formatted_date': '2015-...

如果我们只使用 1 个日期 + 删除 [],一切都是完美的:

我还尝试了该主题下的代码:如果 [] 中的每个变量的格式都相同,则它工作正常,但是如果如上例所示,那么我得到一个错误“数组必须都是相同的长度”

有谁知道如何将这种类型的 JSON 转换为 DataFrame?

标签: pythonjsonpandasdataframeconverters

解决方案


您可以将该字典列表转换为列表字典。然后将最终的字典转换为多索引列数据框:

import pandas as pd
from collections import defaultdict

data2 = {"data": {'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025},  {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025},  {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025},  {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025},  {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033},  {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033},  {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033},  {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033},  {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200,   'formatted_date': '2019-06-03',   'amount': 0.3},  {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2},  {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4},  {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1},  {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}}

data = {}

for key, values in data2['data'].items():
    res = defaultdict(list)
    {res[k].append(sub[k]) for sub in values for k in sub}
    data[key] = dict(res)

def reform_dict(data):
    reformed_dict = {}

    for outerKey, innerDict in data.items():
        for innerKey, values in innerDict.items():
            reformed_dict[(outerKey, innerKey)] = values

    return reformed_dict

df = pd.concat([pd.DataFrame(reform_dict({key: value})) for key, value in data.items()], axis=1)
print(df)

       PSG.MC                                 ACX.MC                      
         date formatted_date    amount          date formatted_date amount
0  1547452800     2019-01-14  0.032025  1.559545e+09     2019-06-03    0.3
1  1554361200     2019-04-04  0.032025  1.562137e+09     2019-07-03    0.2
2  1562310000     2019-07-05  0.032025  1.591254e+09     2020-06-04    0.4
3  1570690800     2019-10-10  0.032025  1.594019e+09     2020-07-06    0.1
4  1578902400     2020-01-13  0.033000  1.606810e+09     2020-12-01    0.1
5  1588057200     2020-04-28  0.033000           NaN            NaN    NaN
6  1595228400     2020-07-20  0.033000           NaN            NaN    NaN
7  1601362800     2020-09-29  0.033000           NaN            NaN    NaN
8  1603436400     2020-10-23  0.033000           NaN            NaN    NaN

推荐阅读