首页 > 解决方案 > Jupyter Notebook - 可视化问题,如何解决?

问题描述

我在 Jupyter Notebook/Lab 上遇到了有关 dataviz(更准确地说是使用 Seaborn)的问题(请参阅随附的屏幕截图)。

我一直在尝试在不同的 IDE(Pycharm 和 VCS)以及 Web 浏览器上运行我的脚本,结果是一样的。

你能帮我解决这个问题吗?

最好的,

罗密欧

# Question 1 : How many people are in titanic and how many survivors?
import pandas as pd

df = pd.read_csv('titanic.csv')

n_people = len(df)
print('Number of passenger :',n_people)

n_survived = len(df[df['Survived']==1])
print('Number of survivors :', n_survived)

# Question 2 : How many that survived were female and how many that died were female?

sur_f = df.loc[(df['Survived'] == 1) & (df['Sex']=='female')]
print('Survived and female :',len(sur_f))

died_f = df.loc[(df['Survived'] == 0) & (df['Sex']=='female')]
print('Died and female :',len(died_f))

# Question 3 : How many children were on the titanic?

children = df[df['Age']<18]
print('Number of children (under 18) :',len(children))

# Question 4 : How many children died that were on the ship?

died_c = children.loc[(children['Survived']==0)]
print('Number of children that died :',len(died_c))

# Question 5 : How many people had families with them?

family = df.loc[(df['SibSp']!=0) &(df['Parch']!=0)]
print('Number of people who had family (Siblings/Spouses or Parents/children) aboard :',len(family))

# Question 6 : What is the ratio of female to male?

num_female = len(df[df['Sex']=='female'])
num_male = len(df[df['Sex']=='male'])

ratio_female_male = (num_female / num_male) 
ratio_f_t = (num_female/len(df))
ratio_m_t = (num_male/len(df))


print('The ratio female to male is :',round(ratio_female_male,2))
print('The ratio female to total passenger is :',round(ratio_f_t,2))
print('The ratio female to total passenger is :',round(ratio_m_t,2))

# Question 7 : What contributed to the survival of those who survived?

#Convert the male / female 
df['Sex'] = df.Sex.map(lambda x: 0 if x == 'male' else 1)
#or
#gen = {'male' : 0, 'female' : 1}
#df['Sex'] df.Sex.map(gen)

import seaborn as sns
import matplotlib.pyplot as plt

correlation = df.corr(method='pearson')

plt.figure(figsize=(7,4))
plt.title('Correlation between Features', y=1.05, size = 15)
sns.heatmap(correlation, 
            cmap='RdBu_r',
            annot=True,
            linewidth=0.5)
plt.show()

print('The most influential factor is sex, with a correlation coefficient regarding Survived of : 0.54')

特征之间的相关性

标签: pythonjupyter-notebookseaborndata-visualization

解决方案


推荐阅读