首页 > 解决方案 > 从 MongoDB 到 Pandas 数据框

问题描述

我的 MongoDB 数据库中有一个集合,其中每条记录代表一条边(我正在构建的应用程序中的一条道路)。每个记录具有以下形式,其中第一个idid边缘的:

{  
   "_id":{  
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{  
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane":[  
         {  
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
         },
         {  
            "@id":":3330548807_1_1",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
         }
      ]
   }
}

我想对这些数据进行一些分析,我想将记录转换为 pandas 中的数据框。所需的数据框骨架将如下所示:

数据框的理想骨架

我试过用pandas.io.json.json_normalize(d)但无法获得我想要的输出。

正如我们所看到的,我有一组车道,最多可以有两条车道。它也可以只包含一个车道。所以,我想将车道分成两行数据框。

有人可以向我建议一个解决方案吗?

标签: pythonjsonmongodbpandasdataframe

解决方案


如果您的数据像您的一样嵌套,则必须先将其转换为平面形状,然后才能创建数据框。

import pandas

json = [
{
   "_id":{
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane": [
         {
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
         },
         {
            "@id":":3330548807_1_1",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
         }
      ]
   }
},
{
   "_id":{
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane":{
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
      }
   }
},
]

def ensure_list(obj):
    if isinstance(obj, list):
        return obj
    else:
        return [obj]

json_transformed = [
    {
        # edge attributes
        'edge_id': record['edge']['@id'],
        # lane attributes
        'lane_id': lane['@id'],
        # ...
    }
    for record in json
    for lane in ensure_list(record['edge']['lane'])
]

df = pandas.DataFrame(json_transformed)

推荐阅读