r - R Titanic 数据集练习——试图找到幸存的比率
问题描述
我正在练习使用泰坦尼克号数据集。这是我到目前为止的代码。我还将在此处使用 dput 共享数据,以防有多个版本的泰坦尼克号数据集浮动。
structure(list(Class = c("1st", "2nd", "3rd", "Crew", "1st",
"2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1st", "2nd",
"3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd",
"Crew", "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew"
), Sex = c("Male", "Male", "Male", "Male", "Female", "Female",
"Female", "Female", "Male", "Male", "Male", "Male", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Female", "Female", "Female", "Female", "Male", "Male", "Male",
"Male", "Female", "Female", "Female", "Female"), Age = c("Child",
"Child", "Child", "Child", "Child", "Child", "Child", "Child",
"Adult", "Adult", "Adult", "Adult", "Adult", "Adult", "Adult",
"Adult", "Child", "Child", "Child", "Child", "Child", "Child",
"Child", "Child", "Adult", "Adult", "Adult", "Adult", "Adult",
"Adult", "Adult", "Adult"), Survived = c("No", "No", "No", "No",
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No",
"No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"), n = c(0,
0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5, 11,
13, 0, 1, 13, 14, 0, 57, 14, 75, 192, 140, 80, 76, 20)), row.names = c(NA,
-32L), class = c("tbl_df", "tbl", "data.frame"))
library(tibble)
library(ggplot2)
Titanic <- Titanic
Titanic <- as.tibble(Titanic)
ggplot(Titanic, aes(x = Survived, y = n, fill = Class)) +
geom_col(position = "dodge")
grouped_t <- Titanic %>%
group_by(Class, Survived) %>%
mutate(ratio_survived_each_class = n / sum(n))
ggplot(grouped_t, aes(x = Survived, y = ratio_survived_each_class, fill = Class)) +
geom_col(position = "dodge")
我认为第一张图很有趣,因为它显示了不同乘客类别的生存差异。我试图弄清楚如何并排绘制幸存/未幸存的比率。我原以为我可以在按两个分组变量 Class 和 Survived 分组后改变一个新列,即 n / sum(n) ......但后来我得到一个没有意义的图表 - 关于100% 的船员幸存下来,大约 88% 的船员没有幸存……加起来应该是 100% 吧?
我试图理解的另一件事是颠倒 ggplot2 中 x 轴因子的顺序。我一直在通过 StackOverflow 寻找答案,但没有什么对我有用。拿这个散点图。我想要左边的船员。
ggplot(泰坦尼克号,aes(x = Class,y = n,color = Survived)) + geom_point()
+ scale_x_reverse()
不起作用,因为它不是数字(我认为)
scale_x_continuous(trans = "reverse")
不起作用(我不知道为什么)
scale_x_discrete(limits = rev(levels(Titanic$Class)))
也不起作用(我不知道为什么)
解决方案
对于有关订购 x 轴的问题,您可以转换为有序因子。
Titanic %>%
mutate(Class = factor(Class, ordered = TRUE, levels = c("Crew", "1st", "2nd", "3rd"))) %>%
ggplot(aes(x = Class, y = n, color = Survived)) +
geom_point()
推荐阅读
- flutter - 边境容器和卡片
- awk - 使用 awk grep 和 sed 的某种组合从反向文件搜索中获取第一个匹配项的更有效方法是什么
- python - 使用 Python 读取添加到 hdfs 文件的最后一批数据
- git - 无法在 MacOS 10 上克隆 Google 源代码库
- android - 修复 onActivityResult 后 onSaveInstanceState 后无法执行此操作而无 UI 问题
- css - 如何使用网格模板列居中文本中心
- django - “未提供身份验证凭据。” 和终端说未经授权:/api/auth/login/django-rest-framework
- c++ - 使用读取 [ebp+4] 的 MSVC 内联汇编移植到 64 位
- java - Spring Boot SSL 不会为所有人重定向 HTTP 到 HTTPS
- python-3.x - 如何在python中的类方法的堆叠装饰器中正确传递self?