首页 > 解决方案 > 格式化数据框中的大量浮点数

问题描述

我需要帮助,我无法很好地展示 seaborn 情节。

人物形象

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

dataset = pd.read_csv('sales.csv', header=0,sep =',',
                  usecols = [1,2,3,4])
#remove NaN
dataset.dropna(inplace = True)
df = pd.DataFrame(data=dataset)
sns.regplot(data=df, x='TV', y='sales')
plt.show()

作为 sales_csv 的示例:

id,TV,radio,newspaper,sales
1,230.10000000,37.8,69.2,22.1
2,1e12,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9
5,180.8,10.8,58.4,12.9
6,8.7,48.9,75,7.2
7,57.5,32.8,23.5,11.8
8,120.2,19.6,11.6,13.2
9,8.6,2.1,1,4.8
10,199.8,2.6,21.2,10.6
11,66.1,5.8,24.2,8.6
12,214.7,24,4,17.4
13,23.8,35.1,65.9,9.2
14,97.5,7.6,7.2,9.7
15,1,32.9,46,19
16,195.4,47.7,52.9,22.4
17,67.8,36.6,114,12.5
18,281.4,39.6,55.8,24.4
19,69.2,20.5,18.3,11.3
20,147.3,23.9,19.1,14.6
21,218.4,27.7,53.4,18
22,237.4,5.1,23.5,12.5
23,13.2,15.9,49.6,5.6
24,228.3,16.9,26.2,15.5
25,62.3,12.6,18.3,9.7
26,262.9,3.5,19.5,12
27,142.9,29.3,12.6,15
28,240.1,16.7,22.9,15.9
29,248.8,27.1,22.9,18.9
30,70.6,16,40.8,10.5
31,292.9,28.3,43.2,21.4
32,112.9,17.4,38.6,11.9
33,97.2,1.5,30,9.6
34,1e12,20,0.3,17.4

标签: pythonpandasseaborn

解决方案


主要问题是数据集包含1e12用于表示 NA 的值。这些值应该被替换或删除。最简单的转换'1e12'方法NA是通过na_values='1e12'参数 to pd.read_csv()

或者,dataset.replace(1e12, pd.NA, inplace=True)可用于稍后转换它们。

请注意,dataset已经是一个数据框,因此df = pd.DataFrame(data=dataset)不需要调用。

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dataset = pd.read_csv('sales.csv', header=0, sep=',', na_values='1e12',
                      usecols=[1, 2, 3, 4])
# remove NaN
dataset.dropna(inplace=True)
sns.regplot(data=dataset, x='TV', y='sales')
plt.show()

结果图


推荐阅读