r - 删除了包含缺失值的 N 行,但没有缺失值或超出范围的值
问题描述
一周前我发布了一个类似的问题,但我未能确定真正的问题。因此,这个问题远非正确。
现在,我清楚地知道发生了什么,但我不明白为什么会这样。我还查看了与相同错误相关的类似问题,但这些问题的解决方案不适用于我的案例。
我正在绘制调查现场工作过程中变量的频率分布。因此,它显示了这些变量的比例如何随时间变化。
所以,我有一个变量(Startday),它告诉受访者是哪一天参加调查的,如果他/她没有,那么它就是 NA。然后,我有典型的变量,如性别或婚姻状况。
这是绘制此类图表的代码
df %>%
mutate(date = lubridate::mdy(startday)) %>%
arrange(date) %>%
mutate(Rs = cumsum(sf_sex %in% c("Male", "Female")),
female_Rs = cumsum(sf_sex == "Female")) %>%
group_by(date) %>%
slice(n()) %>%
select(date, Rs, female_Rs) %>%
mutate(female_prop = female_Rs/Rs) %>%
ggplot(aes(x = date, y = female_prop)) +
geom_point() +
geom_line()
这就是我得到的。
正是我想要的。当我将婚姻状况用作变量时,问题就出现了(并且该变量与另一个变量具有相同的性质:虚拟和字符)。这就是我使用以下代码得到的结果:
df %>%
mutate(date = lubridate::mdy(startday)) %>%
arrange(date) %>%
mutate(Rs = cumsum(Maritaldummy %in% c("Not married", "Married")),
Married_Rs = cumsum(Maritaldummy == "Married")) %>%
group_by(date) %>%
slice(n()) %>%
select(date, Rs, Married_Rs) %>%
mutate(Married_prop = Married_Rs/Rs) %>%
ggplot(aes(x = date, y = Married_prop)) +
geom_point() +
geom_line()
警告消息:1:删除了包含缺失值 (geom_point) 的 34 行。2:删除了 34 行包含缺失值 (geom_path)。
如您所见,观测在 6 月 5 日左右停止。
需要考虑的事项:
- 它没有超出范围,因为我尝试使用 ylim() 和 xlim() 更改图形的范围
- 没有缺失值
当此代码适用于实验组 2 和 3(每个 n = 350)但不适用于实验组 1(n = 2050)时,奇怪的部分就出现了。我确实相信错误必须来自这里,因为当我为第 1 组随机抽样少于 1300 个观察值时......它有效!!!这是第 2 组的相同代码示例。
我给你一个可重复的例子,但我担心这个错误只有在与完整样本一起使用时才有效,但也许你会发现它有什么问题?
非常感谢您的关注、时间和帮助。
df <- structure(list(startday = c("06/02/2019", "05/22/2019", "05/28/2019",
"05/26/2019", "06/03/2019", "06/10/2019", "05/22/2019", "05/30/2019",
"05/31/2019", "06/18/2019", "05/22/2019", "05/25/2019", "05/25/2019",
"05/22/2019", "06/14/2019", "06/14/2019", "05/20/2019", "05/27/2019",
"05/20/2019", "05/21/2019", "05/20/2019", "05/20/2019", "06/09/2019",
"06/12/2019", "05/24/2019", "05/20/2019", "05/20/2019", "05/28/2019",
"06/09/2019", "05/20/2019", "06/21/2019", "06/03/2019", "06/07/2019",
"05/26/2019", "05/28/2019", "06/03/2019", "06/06/2019", "06/05/2019",
"05/27/2019", "06/10/2019", "05/20/2019", "06/05/2019", "05/20/2019",
"06/04/2019", "05/23/2019", "05/20/2019", "06/11/2019", "05/28/2019",
"06/09/2019", "06/15/2019", "05/25/2019", "06/14/2019", "05/20/2019",
"06/05/2019", "06/04/2019", "06/10/2019", "06/16/2019", "06/05/2019",
"06/29/2019", "05/30/2019", "06/03/2019", "06/09/2019", "05/20/2019",
"05/25/2019", "06/16/2019", "06/14/2019", "05/21/2019", "05/28/2019",
"06/09/2019", "06/07/2019", "05/25/2019", "05/20/2019", "05/27/2019",
"05/20/2019", "05/21/2019", "05/20/2019", "06/17/2019", "06/26/2019",
"06/07/2019", "05/22/2019", "06/19/2019", "06/04/2019", "05/21/2019",
"05/21/2019", "05/21/2019", "06/14/2019", "05/25/2019", "06/19/2019",
"05/20/2019", "06/03/2019", "05/20/2019", "06/04/2019", "05/20/2019",
"05/27/2019", "05/22/2019", "05/20/2019", "06/02/2019", "05/21/2019",
"05/23/2019", "06/03/2019", "06/14/2019", "06/14/2019", "06/07/2019",
"05/20/2019", "05/23/2019", "06/24/2019", "06/03/2019", "05/20/2019",
"06/06/2019", "06/15/2019", "06/06/2019", "05/27/2019", "05/24/2019",
"05/22/2019", "05/20/2019", "05/30/2019", "06/23/2019", "05/21/2019",
"05/20/2019", "06/16/2019", "05/20/2019", "05/24/2019", "05/21/2019",
"05/21/2019", "06/20/2019", "05/20/2019", "05/22/2019", "06/06/2019",
"05/20/2019", "05/21/2019", "06/15/2019", "05/27/2019", "05/26/2019",
"06/06/2019", "05/20/2019", "06/05/2019", "06/02/2019", "06/20/2019",
"05/22/2019", "05/20/2019", "06/03/2019", "05/20/2019", "06/03/2019",
"05/20/2019", "06/03/2019", "05/22/2019", "05/20/2019", "05/22/2019",
"05/22/2019", "05/20/2019", "05/20/2019", "05/23/2019", "05/23/2019",
"05/23/2019", "06/05/2019", "06/08/2019", "06/03/2019", "05/24/2019",
"06/05/2019", "06/02/2019", "05/20/2019", "05/29/2019", "06/04/2019",
"05/21/2019", "06/08/2019", "06/12/2019", "05/30/2019", "06/05/2019",
"06/12/2019", "05/20/2019", "05/20/2019", "06/26/2019", "05/20/2019",
"06/04/2019", "05/20/2019", "06/06/2019", "05/24/2019", "05/24/2019",
"06/06/2019", "06/22/2019", "05/26/2019", "05/29/2019", "05/27/2019",
"05/20/2019", "05/23/2019", "05/21/2019", "05/22/2019", "05/22/2019",
"06/11/2019", "06/05/2019", "06/05/2019", "05/28/2019", "05/23/2019",
"06/13/2019", "05/20/2019", "06/07/2019", "05/28/2019", "06/12/2019",
"06/28/2019", "06/15/2019"), sf_sex = c("Female", "Male", "Male",
"Male", "Male", "Female", "Female", "Female", "Female", "Female",
"Female", "Male", "Female", "Male", "Female", "Female", "Female",
"Male", "Female", "Female", "Male", "Male", "Female", "Male",
"Male", "Female", "Male", "Female", "Female", "Male", "Male",
"Male", "Female", "Female", "Male", "Male", "Female", "Male",
"Female", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Male", "Male", "Male", "Female", "Male", "Female",
"Male", "Male", "Male", "Female", "Female", "Female", "Female",
"Male", "Female", "Male", "Male", "Female", "Female", "Male",
"Male", "Male", "Male", "Female", "Male", "Male", "Female", "Female",
"Male", "Male", "Male", "Male", "Female", "Female", "Male", "Male",
"Female", "Male", "Male", "Male", "Female", "Female", "Female",
"Female", "Male", "Female", "Female", "Female", "Male", "Female",
"Female", "Female", "Male", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Female", "Male", "Male",
"Female", "Male", "Female", "Female", "Male", "Female", "Male",
"Male", "Female", "Female", "Female", "Male", "Female", "Female",
"Male", "Female", "Male", "Female", "Female", "Male", "Female",
"Female", "Male", "Female", "Male", "Male", "Female", "Female",
"Female", "Female", "Female", "Male", "Female", "Female", "Female",
"Female", "Female", "Male", "Female", "Male", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Male", "Male",
"Male", "Female", "Female", "Female", "Female", "Female", "Male",
"Male", "Female", "Female", "Female", "Male", "Female", "Male",
"Female", "Male", "Female", "Female", "Male", "Female", "Male",
"Male", "Female", "Male", "Female", "Female", "Male", "Female",
"Female", "Male", "Female", "Female", "Female", "Male", "Male",
"Male", "Female", "Female", "Female", "Female", "Male"), Maritaldummy = c("Not married",
"Married", "Married", "Not married", "Not married", "Married",
"Married", "Married", "Not married", "Not married", "Not married",
"Married", "Married", "Married", "Married", "Married", "Not married",
"Not married", "Not married", "Married", "Not married", "Not married",
"Not married", "Not married", "Not married", "Married", "Married",
"Not married", "Married", "Not married", "Married", "Not married",
"Not married", "Not married", "Not married", "Not married", "Married",
"Not married", "Married", "Married", "Not married", "Not married",
"Married", "Not married", "Married", "Not married", "Not married",
"Not married", "Married", "Married", "Married", "Not married",
"Not married", "Married", "Married", "Not married", "Not married",
"Married", "Married", "Not married", "Married", "Married", "Married",
"Not married", "Married", "Not married", "Not married", "Married",
"Not married", "Married", "Not married", "Not married", "Not married",
"Married", "Not married", "Not married", "Married", "Married",
"Not married", "Married", "Married", "Married", "Married", "Married",
"Married", "Not married", "Married", "Not married", "Not married",
"Not married", "Not married", "Not married", "Married", "Not married",
"Married", "Married", "Not married", "Not married", "Married",
"Not married", "Married", "Married", "Married", "Married", "Not married",
"Married", "Married", "Married", "Not married", "Married", "Not married",
"Not married", "Married", "Not married", "Married", "Not married",
"Not married", "Married", "Not married", "Married", "Not married",
"Married", "Married", "Not married", "Married", "Married", "Married",
"Not married", "Married", "Married", "Married", "Married", "Married",
"Married", "Married", "Married", "Not married", "Not married",
"Not married", "Married", "Married", "Married", "Not married",
"Married", "Not married", "Married", "Not married", "Married",
"Married", "Married", "Married", "Married", "Not married", "Married",
"Not married", "Not married", "Married", "Married", "Married",
"Married", "Married", "Married", "Married", "Married", "Not married",
"Married", "Married", "Married", "Not married", "Not married",
"Married", "Not married", "Married", "Not married", "Married",
"Married", "Not married", "Not married", "Married", "Not married",
"Married", "Not married", "Not married", "Married", "Not married",
"Not married", "Married", "Married", "Married", "Not married",
"Not married", "Not married", "Married", "Married", "Married",
"Married", "Not married", "Not married", "Married", "Not married")), row.names = c("3564", "2999", "20144", "17281", "11917",
"14549", "5116", "10553", "23108", "19521", "277", "24312", "5449",
"19006", "9171", "21265", "20494", "11961", "15556", "12237",
"10959", "23460", "14050", "13996", "16222", "21852", "5593",
"18871", "18770", "776", "24913", "7813", "25079", "1063", "22878",
"13638", "19169", "7226", "14895", "8088", "19789", "22835",
"14196", "13816", "7124", "10394", "8290", "16807", "732", "3130",
"16033", "14958", "7500", "15039", "1538", "12532", "2890", "18907",
"21581", "3120", "20198", "22943", "8468", "3128", "24153", "22911",
"6225", "8489", "13040", "17506", "14855", "1500", "11955", "24484",
"17625", "19888", "10351", "19210", "22946", "14699", "1959",
"6770", "23286", "11842", "12811", "22197", "5899", "10138",
"20505", "16090", "17835", "20512", "12271", "9152", "12767",
"25244", "16865", "6970", "10036", "22531", "12329", "15366",
"2", "9440", "2100", "23166", "11421", "18912", "4441", "25202",
"20599", "411", "12584", "1586", "4543", "1307", "10044", "25033",
"5005", "25122", "16236", "9653", "16194", "14393", "7512", "10059",
"12010", "1619", "3136", "24088", "14641", "19564", "9568", "18815",
"21079", "22010", "9553", "20380", "20416", "15745", "7000",
"7735", "24924", "15286", "20403", "4680", "13714", "13302",
"12508", "17514", "4480", "7446", "3723", "24069", "25317", "14607",
"12274", "21715", "8983", "23488", "9228", "7265", "18192", "16475",
"11760", "15530", "18177", "11535", "18839", "17908", "9789",
"18045", "1025", "21645", "11853", "22453", "18052", "22763",
"9", "12286", "15329", "3306", "13215", "16533", "18385", "23784",
"10131", "4894", "14154", "3365", "8648", "17325", "21219", "16689",
"9969", "10621", "24206", "19621", "8440", "19889"), class = "data.frame")
解决方案
如果您将列中的任何一个值更改为,我们可以重现该错误NA
。
library(dplyr)
library(ggplot2)
df$Maritaldummy[195] <- NA
df %>%
mutate(date = lubridate::mdy(startday)) %>%
arrange(date) %>%
mutate(Rs = cumsum(Maritaldummy %in% c("Not married", "Married")),
Married_Rs = cumsum(Maritaldummy == "Married")) %>%
group_by(date) %>%
slice(n()) %>%
select(date, Rs, Married_Rs) %>%
mutate(Married_prop = Married_Rs/Rs) %>%
ggplot(aes(x = date, y = Married_prop)) +
geom_point() +
geom_line()
退货
警告消息:1:删除了包含缺失值 (geom_point) 的 38 行。2:删除了 38 行包含缺失值 (geom_path)。
由于一个或多个值NA
cumsum
失败并返回NA
之后的所有值。一个简单的解决方法是使用which%in%
而不是与.==
FALSE
NA
df %>%
mutate(date = lubridate::mdy(startday)) %>%
arrange(date) %>%
mutate(Rs = cumsum(Maritaldummy %in% c("Not married", "Married")),
Married_Rs = cumsum(Maritaldummy %in% "Married")) %>%
group_by(date) %>%
slice(n()) %>%
select(date, Rs, Married_Rs) %>%
mutate(Married_prop = Married_Rs/Rs) %>%
ggplot(aes(x = date, y = Married_prop)) +
geom_point() +
geom_line()
推荐阅读
- python - 如何在某些条件之间选择数据
- python-3.x - 如果使用 open() 和 reader() 打开文件,则相同的 for 循环有效,但不适用于 pandas 数据帧
- c# - Sqlite ExecuteNonQuery() 不返回
- c++ - 如何在不阻塞的情况下跨两个线程传递数据?
- grep - 用于读取和剪切特定模式的 Unix 命令
- docker - 从 Docker Toolbox 连接到 docker 地址
- objective-c - Objective-C 属性类型中指定的旧式编码是什么?
- c - 为什么这不会在 C 中生成越界访问?
- mongodb - 如何加速许多文档的大型数组的聚合
- c++ - string::replace 是否替换无效的迭代器和引用?