首页 > 解决方案 > Faulty Pandas dataframe read_json sorting on python3.5.9

问题描述

Dataframe with more than 10 rows is incorrectly sorted on python3.5.9 after converting to json and back to pandas.DataFrame.

from pandas import DataFrame, read_json

columns = ['a', 'b', 'c']
data = [[1*i, 2*i, 3*i] for i in range(11)]
df = DataFrame(columns=columns, data=data)
print(df)
#      a   b   c
# 0    0   0   0
# 1    1   2   3
# 2    2   4   6
# 3    3   6   9
# 4    4   8  12
# 5    5  10  15
# 6    6  12  18
# 7    7  14  21
# 8    8  16  24
# 9    9  18  27
# 10  10  20  30

new_df = read_json(df.to_json())
print(new_df)
#      a   b   c
# 0    0   0   0
# 1    1   2   3
# 10  10  20  30   # this should be the last line
# 2    2   4   6
# 3    3   6   9
# 4    4   8  12
# 5    5  10  15
# 6    6  12  18
# 7    7  14  21
# 8    8  16  24
# 9    9  18  27

So DataFrame which was created with read_json seems to be sorting indexes like strings (1,10,2,3,...) instead of ints (1,2,3..).

Behaviour generated with Python 3.5.9 (default, Jan 4 2020, 04:09:01) (docker image python:3.5-stretch)

Everything seems to be working fine on my local machine (Python 3.8.1 (default, Dec 21 2019, 20:57:38)).

pandas==0.25.3 was used on both instances.

Is where a way to fix this without upgrading python?

标签: pythonpandasdataframe

解决方案


用于sort_values对列上的数据框进行排序a。如下所示:

new_df = read_json(df.to_json())

#sort column
print(new_df.sort_values('a'))

#sort index
print(new_df.sort_index())

#ouput
     a   b   c
0    0   0   0
1    1   2   3
2    2   4   6
3    3   6   9
4    4   8  12
5    5  10  15
6    6  12  18
7    7  14  21
8    8  16  24
9    9  18  27
10  10  20  30
``

推荐阅读