r - GGPLOT:如何为我的数据点的指定子集绘制黄土曲线?
问题描述
(我在 R 自学并经常使用这个论坛,但这是我的第一篇文章。感谢反馈。)
这应该有一个相对简单的解决方案,但我找不到它,这让我想把我的电脑扔出窗外。说到重点,我有一个简单的数据集:
mydata <- structure(list(Date = c("2020-06-22", "2020-06-22", "2020-06-23",
"2020-06-23", "2020-06-24", "2020-06-24", "2020-06-25", "2020-06-25",
"2020-06-26", "2020-06-26", "2020-06-29", "2020-06-29", "2020-06-30",
"2020-06-30", "2020-07-01", "2020-07-01", "2020-07-02", "2020-07-02",
"2020-07-06", "2020-07-06", "2020-07-06", "2020-07-06", "2020-07-07",
"2020-07-07", "2020-07-08", "2020-07-08", "2020-07-08", "2020-07-09",
"2020-07-09", "2020-07-09"), Location = c("Haskell", "Bustamante",
"Haskell", "Bustamante", "Haskell", "Bustamante", "Bustamante",
"Haskell", "Bustamante", "Haskell", "Bustamante", "Haskell",
"Bustamante", "Haskell", "Bustamante", "Haskell", "Bustamante",
"Haskell", "Bustamante", "Haskell", "Bustamante", "Haskell",
"Bustamante", "Haskell", "Bustamante", "Haskell", "Tap Water",
"Bustamante", "Haskell", "Tap Water"), UVT = c(72.2, 65.6, 70,
61.8, 71.5, 63.9, 63.9, 71.5, 68.1, 71.5, 68.9, 71.3, 71.3, 72.4,
68.9, 67.3, 49.4, 49, 39.3, 42.3, 64.2, 70.9, 33.3, 49.3, 46,
48.8, 88.7, 66, 70.5, 84.7), Source = c("Shawn", "Shawn", "Jesus",
"Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus",
"Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus",
"Jesus", "Jesus", "Jesus", "Shawn", "Shawn", "Jesus", "Jesus",
"Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus")), row.names = c(NA,
-30L), class = "data.frame")
首先,我尝试按位置绘制数据分组,但我猜由于“自来水”组只有 2 个数据点,它不符合度数要求:
#Import Packages
library(tidyverse)
#Import Data
mydata <- read.csv("L:\\2019\\19W06195 - EPW HRS and RRB WWTPs Disinfection Study\\Design\\Design Criteria\\R\\UVT Graphs\\UVTdata.csv")
#Plot
p <- ggplot(data=mydata, aes(x=as.Date(mydata[,1], "%Y-%m-%d"), y=mydata[,3], color=mydata[,2])) + geom_point() + geom_smooth(method = "loess", se = FALSE)
p + scale_x_date(date_breaks = "days" , date_labels = "%b-%d")
这是我收到的错误:
Warning messages:
1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
span too small. fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
at 18451
3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
radius 2.5e-005
4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
all data on boundary of neighborhood. make span bigger
5: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
pseudoinverse used at 18451
6: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
neighborhood radius 0.005
7: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
reciprocal condition number 1
8: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
at 18452
9: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
radius 2.5e-005
10: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
all data on boundary of neighborhood. make span bigger
11: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
There are other near singularities as well. 2.5e-005
12: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
zero-width neighborhood. make span bigger
13: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... :
zero-width neighborhood. make span bigger
14: Computation failed in `stat_smooth()`:
NA/NaN/Inf in foreign function call (arg 5)
请注意,运行相同的代码,但指定“method=lm”而不是“method=loess”可以完美运行,但没有显示我想要的趋势。
为了解决这个问题,我尝试将条件设置为默认为具有太少数据点的数据子集的线性回归:
sProduct <- unique(mydata[,2])
p <- ggplot(mydata, aes(as.Date(mydata[,1], "%Y-%m-%d"), mydata[,3], color = mydata[,2])) + geom_point()
for (i in sProduct){
sMethod <- ifelse(sum(mydata[,2] == i) <= 5, "lm", "loess")
p <- p + geom_smooth(data = subset(mydata, mydata[,2] == i), method = sMethod, se = FALSE)
}
p
尽管做出了这些努力,我现在还是遇到了一个审美错误:
Error: Aesthetics must be either length 1 or the same as the data (14): x, y and colour
Run `rlang::last_error()` to see where the error occurred.
我认为这是由于 geom_points 和 geom_smooth 中的数据子集之间的数据点数量不一致,但我不确定。我还尝试设置数据子集以从 geom_smooth 中排除“自来水”,因为无论如何我通常对那里的趋势不感兴趣:
p <- ggplot(data=mydata, aes(x=as.Date(mydata[,1], "%Y-%m-%d"), y=mydata[,3], color=mydata[,2])) + geom_point() + geom_smooth(data=subset(mydata, Location=="Bustamante" | Location=="Haskell"), method = "loess", se = FALSE)
p + scale_x_date(date_breaks = "days" , date_labels = "%b-%d")
这会产生相同的错误。这里的任何帮助将不胜感激!谢谢!
解决方案
推荐阅读
- c# - 使用 SQLite 的 sqlite-net-pcl 库更新单行表中单个列的值的简单方法是什么?
- javascript - 计算重复数字并替换最后一个索引中的元素
- sql - 与要加入的 3 个表相加
- angular - Oracle ORDS Oauth2 Rest 服务是否受跨源 (CORS) 保护?
- c - 通过十进制、八进制和十六进制值显示负数的 C 程序
- reactjs - 如何阻止useEffect进入无限循环或useEffect内的setState
- python - 在多个轮廓 CV2 周围拟合一个边界椭圆
- linux - Check json field from the curl response in linux
- c++ - 无法在 Windows 10 中从 Cygwin 安装 boost
- c++ - 有人可以告诉在成员函数的定义中使用此类代码的原因:`Base::fool(T1 *pub, T2 *info){ (void)pub; (无效)信息;}`