首页 > 解决方案 > 在保持日期时间排序的同时计算从 json 到 pandas 的值

问题描述

    {"id": 814984317021495298, "date": "2016-12-30", "time": "18:59:37", "timezone": 
    "-0400", "replies_count": 7708, "username": "im_theantitrump"}
    {"id": 814984316195311616, "date": "2016-12-30", "time": "18:59:37", "timezone": 
    "-0400", "replies_count": 25772, "username": "bishyoucray2"}

Json 文件看起来像这样。运行以下命令时:

    df = pd.read_json('filename.json', lines=True)
    df['date'].value_counts() 

结果如下所示:

    ,date
    2016-11-17,3403
    2016-11-04,2605
    2016-12-09,2285
    2016-11-24,1934
    2016-12-19,1874
    2016-12-07,1864
    2016-11-28,1825
    2016-11-29,1715
    2016-11-27,1688
    2016-12-15,1683
    2016-12-06,1680

是否有任何替代方法可以保持日期时间按升序排序(默认情况下它在 json 文件中升序)。也许添加“计数”作为数据框的第二个标题,以便与列对齐?这是一个例子。

    date    count
    10/31/2016  97
    11/1/2016   3360
    11/2/2016   5719
    11/3/2016   1206
    11/4/2016   3888
    11/5/2016   1176
    11/6/2016   1598
    11/7/2016   4542
    11/8/2016   1750
    11/9/2016   1224
    11/10/2016  1489
    11/11/2016  3138
    11/12/2016  1449
    11/13/2016  1299
    11/14/2016  1136
    11/15/2016  1525

标签: pythonjsonpandasdataframe

解决方案


.value_counts要按排序顺序保存输出,您可以执行以下操作:

out = df["date"].value_counts().to_frame(name="count")
out = out.sort_index()
out.to_csv("data.csv", index_label="date")

节省data.csv

date,count
2016-11-30,2
2016-12-30,2

使用的 Json 数据:

{"id": 814984317021495298, "date": "2016-12-30", "time": "18:59:37", "timezone": "-0400", "replies_count": 7708, "username": "im_theantitrump"}
{"id": 814984316195311616, "date": "2016-12-30", "time": "18:59:37", "timezone": "-0400", "replies_count": 25772, "username": "bishyoucray2"}
{"id": 814984316195311616, "date": "2016-11-30", "time": "18:59:37", "timezone": "-0400", "replies_count": 25772, "username": "bishyoucray2"}
{"id": 814984316195311616, "date": "2016-11-30", "time": "18:59:37", "timezone": "-0400", "replies_count": 25772, "username": "bishyoucray2"}

推荐阅读