首页 > 解决方案 > R Titanic 数据集练习——试图找到幸存的比率

问题描述

我正在练习使用泰坦尼克号数据集。这是我到目前为止的代码。我还将在此处使用 dput 共享数据,以防有多个版本的泰坦尼克号数据集浮动。

structure(list(Class = c("1st", "2nd", "3rd", "Crew", "1st", 
"2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1st", "2nd", 
"3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", 
"Crew", "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew"
), Sex = c("Male", "Male", "Male", "Male", "Female", "Female", 
"Female", "Female", "Male", "Male", "Male", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male", 
"Female", "Female", "Female", "Female", "Male", "Male", "Male", 
"Male", "Female", "Female", "Female", "Female"), Age = c("Child", 
"Child", "Child", "Child", "Child", "Child", "Child", "Child", 
"Adult", "Adult", "Adult", "Adult", "Adult", "Adult", "Adult", 
"Adult", "Child", "Child", "Child", "Child", "Child", "Child", 
"Child", "Child", "Adult", "Adult", "Adult", "Adult", "Adult", 
"Adult", "Adult", "Adult"), Survived = c("No", "No", "No", "No", 
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", 
"No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"), n = c(0, 
0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5, 11, 
13, 0, 1, 13, 14, 0, 57, 14, 75, 192, 140, 80, 76, 20)), row.names = c(NA, 
-32L), class = c("tbl_df", "tbl", "data.frame"))

library(tibble)
library(ggplot2)

Titanic <- Titanic

Titanic <- as.tibble(Titanic)

ggplot(Titanic, aes(x = Survived, y = n, fill = Class)) +
  geom_col(position = "dodge")

grouped_t <- Titanic %>%
  group_by(Class, Survived) %>%
  mutate(ratio_survived_each_class = n / sum(n))

ggplot(grouped_t, aes(x = Survived, y = ratio_survived_each_class, fill = Class)) +
  geom_col(position = "dodge")

在此处输入图像描述

在此处输入图像描述

我认为第一张图很有趣,因为它显示了不同乘客类别的生存差异。我试图弄清楚如何并排绘制幸存/未幸存的比率。我原以为我可以在按两个分组变量 Class 和 Survived 分组后改变一个新列,即 n / sum(n) ......但后来我得到一个没有意义的图表 - 关于100% 的船员幸存下来,大约 88% 的船员没有幸存……加起来应该是 100% 吧?

我试图理解的另一件事是颠倒 ggplot2 中 x 轴因子的顺序。我一直在通过 StackOverflow 寻找答案,但没有什么对我有用。拿这个散点图。我想要左边的船员。

ggplot(泰坦尼克号,aes(x = Class,y = n,color = Survived)) + geom_point()

在此处输入图像描述

+ scale_x_reverse()

不起作用,因为它不是数字(我认为)

  scale_x_continuous(trans = "reverse")

不起作用(我不知道为什么)

  scale_x_discrete(limits = rev(levels(Titanic$Class)))

也不起作用(我不知道为什么)

标签: rggplot2dplyr

解决方案


对于有关订购 x 轴的问题,您可以转换为有序因子。

Titanic %>% 
  mutate(Class = factor(Class, ordered = TRUE, levels = c("Crew", "1st", "2nd", "3rd"))) %>% 
  ggplot(aes(x = Class, y = n, color = Survived)) + 
  geom_point() 


推荐阅读