r - 通过重复值和何时有断点创建具有条件的新列
问题描述
我的数据是大约 40 只动物(id),通过遥测定位,我已经规定了 3 个区域。第一个是AR
,哪里是繁殖区,哪里是AM
迁徙,哪里AA
是觅食区。所有动物的第一个位置在AR
。但有时动物还处于繁殖期(在AR
),但可以出去AM
几次,然后又回来了AR
。只有当动物才AM
开始迁徙,直到到达觅食区AA
。因此,它们从 开始AR
,然后开始迁移AM
,然后到达觅食区AA
。
我试图用一些我还不知道该怎么做的条件创建一个新列,例如我有这个数据框
id area
2304 AR
2304 AR
2304 AR
2304 AM #this AM for example, can repeat until 20 times and then came back to AR
2304 AM
2304 AR
2304 AR
2304 AR
2304 AM
2304 AM
2304 AM
2304 AM
2304 ...
2304 AM
2304 AM
2304 AM
2304 AA
2304 AA
2304 ...
2304 AA
所以,当有 AR x 次并且在此之后有一个或直到 20 点并且回来有 AR 时,我想要一个带有 AR 的新列。到有 AM x 次且只有 AM 的那一刻,没有回到 AR,我想要 AM 的新列。像这样:
和 AA 没关系,AA = AA 总是
我期待这个:
id area fixed_area
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AR #this AM for example, can repeat until 20 times and then came back to AR
2304 AM AR
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AM
2304 AM AM
2304 AM AM
2304 AM AM
2304 ... ...
2304 AM AM
2304 AM AM
2304 AM AM
2304 AA AA
2304 AA AA
2304 ... ...
2304 AA AA
我试过这个:
但是AA
缺少了,也许问题是因为需要对每只动物(id)进行这种分离
> table(df$area)
AA AM AR
31460 39101 28820
class(df$area)
[1] "character"
> idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
> df$fixed_area <- with(df, replace(area, idx < max(idx[area == 'AM']), 'AR'))
> table(df$fixed_area)
AM AR
145 99236
>
在此之后我输入了数据框,但我的数据框有超过 90.000 行,所以我只复制了 head 值
> dput(head(df))
structure(list(DeployID = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), Start = structure(c(1477323868,
1477323946, 1477324002, 1477324044, 1477324260, 1477324480), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), End = structure(c(1477323944, 1477324000,
1477324042, 1477324170, 1477324458, 1477324542), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), What = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Dive", "Message", "Surface"), class = "factor"),
Shape = structure(c(2L, 4L, 3L, 2L, 2L, 2L), .Label = c("",
"Square", "U", "V"), class = "factor"), DepthMean = c(14.5,
16.5, 13, 14.5, 11, 12.5), DurationMean = c(76, 54, 40, 126,
198, 62), DepthMin = c(14.5, 16.5, 13, 14.5, 11, 12.5), DepthMax = c(14.5,
16.5, 13, 14.5, 11, 12.5), depth_range = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("shallow", "deep"), class = c("ordered",
"factor")), MidTime = structure(c(1477323906, 1477323973,
1477324022, 1477324107, 1477324359, 1477324511), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), year = c(2016, 2016, 2016, 2016,
2016, 2016), id = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), segmentid = c("111868_16",
"111868_16", "111868_16", "111868_16", "111868_16", "111868_16"
), mu.x = c(-4446545.25191192, -4446557.10576816, -4446565.77504969,
-4446580.81370994, -4446625.40007808, -4446652.29459533),
mu.y = c(-2305423.86124176, -2305461.88537725, -2305489.69364377,
-2305537.93137917, -2305680.93056743, -2305767.17264774),
lon = c(-39.9439956132156, -39.944102098218, -39.944179975699,
-39.9443150702825, -39.9447155964422, -39.9449571940013),
lat = c(-20.3985940756941, -20.3989161274532, -20.3991516537744,
-20.3995602097098, -20.4007713539709, -20.4015017842338),
lq_closest_filt = c(7L, 7L, 7L, 7L, 7L, 7L), dt_closest_filt = c(0.0516666666666667,
0.0702777777777778, 0.0838888888888889, 0.1075, 0.1775, 0.219722222222222
), dist_closest_filt = c(0.103680210832692, 0.141026573116106,
0.168339162761167, 0.215717097671267, 0.356168027785347,
0.440874049523752), rel.angle = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), speed = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), depth_bin = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("(0,50]", "(50,100]", "(100,150]",
"(150,200]", "(200,250]", "(250,300]", "(300,350]", "(350,400]",
"(400,450]", "(450,500]", "(500,550]", "(550,600]", "(600,650]",
"(650,700]"), class = "factor"), bat = structure(list(depth = c(-59L,
-59L, -59L, -59L, -59L, -59L)), row.names = c(NA, 6L), class = "data.frame"),
area = c("AR", "AR", "AR", "AR", "AR", "AR")), row.names = c(NA,
6L), class = "data.frame")
有人知道如何解决这个问题吗?谢谢!
解决方案
听起来您可能需要使用一些规则来决定哪些行带有AM
become AR
。
- 如果连续
AM
数 < 20 - 如果以下目的地不是
AA
一种方法是添加与这两个规则相关的列,使用rle
. 一列将具有lengths
重复序列中的连续值的数量。另一列将具有“下一个”区域。这与决定目的地是回到繁殖区还是继续到饲养区有关。
最后,您可以使用条件语句并将这些行更改AM
为AR
满足以下条件:
- 当前
area
是AM
- 接下来不是
area
_AA
- 重复值的个数小于 20
这是代码:
df_rle <- rle(df$area)
df2 <- cbind(df, next_area = with(df_rle, rep(c(values[-1], NA), lengths)),
count = with(df_rle, rep(lengths, lengths)))
df2$area <- ifelse(with(df2, area == "AM" & next_area != "AA" & count < 20),
"AR", df2$area)
推荐阅读
- git - 如何识别 git hook 脚本是否真的作为钩子运行
- php - Laravel 5.6 在我的刀片视图中显示 svg 图标不起作用
- php - 如何使删除按钮仅在表格的最后一行显示和执行操作?
- linux - bash linux - 从标准输入和标准输出写入和读取
- phpstorm - 在 PhpStorm 中按下 Enter 按钮后缩进
- python - 如何像 kahoot 那样将 IP 和端口加密成一个数字?
- docker - 如何使用外部 c++ 库加速 c++ 项目的 docker 映像构建?
- django - 在 File.read() 上使用编码 UTF-8
- shell - shell脚本中的指令顺序
- wordpress - 如何将外部 API 连接到 wordpress?