首页 > 解决方案 > Pandas Dataframe to JSON: returns a single line for 1 million records

问题描述

I need to do some processing on my JSON data but it turn outs that my JSON is formatted in a way that it contains only one row. On Terminal, wc -l file.json is returning 0

File is created converting Pandas Dataframe to JSON.

Here is the sample: file.json

[
{"id":683156,"overall_rating":5.0,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel","title":"\u201cgreat hotel, great location\u201d","text":"The rooms here are not palatial","author_id":"C0F"},
{"id":692745,"overall_rating":5.0,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square","title":"\u201cabsolutely delightful\u201d","text":"I travelled from Spain...","author_id":"8C1"}
]

标签: jsondata-sciencejqdata-analysisdata-cleaning

解决方案


JSON doesn't need any whitespace, it's perfectly all right to store long JSON data without a single line break (therefore wc -l gives 0).

If you want to "pretty print" you JSON in shell interface, use a tool like jq.

cat example.json
> [{"id":683156,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel"},{"id":692745,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square"}]

cat example.json | jq
> [
>   {
>     "id": 683156,
>     "hotel_id": 220216,
>     "hotel_name": "Beacon Hill Hotel"
>   },
>   {
>     "id": 692745,
>     "hotel_id": 113317,
>     "hotel_name": "Casablanca Hotel Times Square"
>   }
> ]

For reporting length of an array, use jq length

cat example.json | jq length
> 2

推荐阅读