首页 > 解决方案 > 将 Json 转换为 Pandas 数据框

问题描述

我有这种 json,我会将其转换为带有特定列名的 pandas 数据框。

{
    "data": [
        {
            "id": 1,
            "name": "3Way Result",
            "suspended": false,
            "bookmaker": {
                "data": [
                    {
                        "id": 27802,
                        "name": "Ladbrokes",
                        "odds": {
                            "data": [
                                {
                                    "label": "1",
                                    "value": "1.61",
                                    "probability": "62.11%",
                                    "dp3": "1.610",
                                    "american": -164,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": {
                                        "date": "2021-10-01 16:41:27.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    }
                                },
                                {
                                    "label": "X",
                                    "value": "3.90",
                                    "probability": "25.64%",
                                    "dp3": "3.900",
                                    "american": 290,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": {
                                        "date": "2021-10-01 16:41:27.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    }
                                },
                                {
                                    "label": "2",
                                    "value": "5.20",
                                    "probability": "19.23%",
                                    "dp3": "5.200",
                                    "american": 420,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": {
                                        "date": "2021-10-01 16:41:27.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    }
                                }
                            ]
                        }
                    },
                    {
                        "id": 70,
                        "name": "Pncl",
                        "odds": {
                            "data": [
                                {
                                    "label": "1",
                                    "value": "1.65",
                                    "probability": "60.61%",
                                    "dp3": "1.645",
                                    "american": -154,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": {
                                        "date": "2021-10-01 16:59:18.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    }
                                },
                                {
                                    "label": "X",
                                    "value": "4.20",
                                    "probability": "23.81%",
                                    "dp3": "4.200",
                                    "american": 320,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": {
                                        "date": "2021-10-01 16:59:18.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    }
                                },
                                {
                                    "label": "2",
                                    "value": "5.43",
                                    "probability": "18.42%",
                                    "dp3": "5.430",
                                    "american": 443,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": {
                                        "date": "2021-10-01 16:59:18.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    }
                                }
                            ]
                        }
                    }
                ]
            }
        }
    ],
    "meta": {
        "plans": [
            {
                "name": "Football Free Plan",
                "features": "Standard",
                "request_limit": "180,60",
                "sport": "Soccer"
            }
        ],
        "sports": [
            {
                "id": 1,
                "name": "Soccer",
                "current": true
            }
        ]
    }
}

所有列名称包含博彩公司的名称加上标签值。我会采用 label 中的值并将其用作列名和name. 然后将其用作数据框的floatvalue

这里是预期的输出

   1_LadBrokes  X_LadBrokes  2_LadBrokes       last_update_LadBrokes  1_Pncl  X_Pncl  2_Pncl            last_update_Pncl
0         1.61          3.9          5.2  2021-10-01 16:41:27.000000    1.65     4.2    5.43  2021-10-01 16:59:18.000000

标签: pythonjsonpandasdataframe

解决方案


您可以使用json_normalize+来实现它apply

def set_values(x):
    data = x["odds.data"]
    label = data.get("label")
    value = data.get("value")
    last_update_date = data.get("last_update").get("date")
    name = x["name"]
    x[f"{label}_{name}"] = value
    x[f"last_update_{name}"] = last_update_date
    return x


df = (
    pd.json_normalize(data["data"], record_path=["bookmaker", "data"])
    .explode("odds.data")
    .apply(lambda x: set_values(x), axis=1)
    .drop(["odds.data", "id", "name"], axis=1)
    .ffill()
    .bfill()
    .head(1)
)

In [39]: df
Out[39]: 
  1_Ladbrokes 1_Pncl 2_Ladbrokes 2_Pncl X_Ladbrokes X_Pncl       last_update_Ladbrokes            last_update_Pncl
0        1.61   1.65        5.20   5.43        3.90   4.20  2021-10-01 16:41:27.000000  2021-10-01 16:59:18.000000

推荐阅读