r - 计算复杂 table_1 中的持续时间
问题描述
我有一个数据框
df <- data.frame("name" = c("jack", "william", "david", "john"),
"01-Jan-19" = c(NA,"A", NA,"A"),
"01-Feb-19" = c("A","A",NA,"A"),
"01-Mar-19" = c("S","A","A","A"),
"01-Apr-19" = c("A","A","A","S"),
"01-May-19" = c(NA,"A","A","A"),
"01-Jun-19" = c("A","S","A","S"),
"01-Jul-19" = c("A","S","A","S"),
"01-Aug-19" = c(NA,"S","A","A"),
"01-Sep-19" = c(NA,"S","A","S"),
"01-Oct-19" = c("S","S","A","S"),
"01-Nov-19" = c("S","S",NA,"S"),
"01-Dec-19" = c("S","S","S",NA),
"01-Jan-20" = c("S","M","A","M"),
"01-Feb-20" = c("M","M","M","M"))
为了计算每个人从第一个 A 到最后一个 A 之间的持续时间,我可以通过以下方式实现
duration <- df %>%
tidyr::pivot_longer(cols = -name, names_to = 'person', values_drop_na = TRUE) %>%
dplyr::mutate(person = dmy(sub('X', '', person))) %>%
group_by(name) %>%
dplyr::summarise(avg_duration = person[max(which(value == 'A'))] - person[min(which(value == 'A'))])
请问如何为以下目的修改代码?两个A之间的周期,我怎样才能减去具有其他值的周期(任何不是A,例如S,NA)?太感谢了。
解决方案
在这种情况下,我很难理解句号的含义。从您的示例看来,您在一段时间内需要两个值。这也是要减去的时期的情况吗?例如,如果一个人有系列:A、S、A,那么他们与 A 有 0、1 或 2 个句点吗?那么A,S,A,A,S,A呢?
相反,如果每个日期代表一个期间,即在下一个日期之前有效,则每个人值组合的总持续时间可以计算如下:
duration <- df %>%
tidyr::pivot_longer(cols = -name, names_to = 'date') %>%
dplyr::mutate(date = lubridate::dmy(sub('X', '', date))) %>%
group_by(name) %>%
dplyr::arrange(name, date) %>%
dplyr::mutate(duration = c(diff(date), 0)) %>%
dplyr::group_by(name, value) %>%
dplyr::summarise(summed_duration = sum(duration))
# A tibble: 15 x 3
# Groups: name [4]
name value summed_duration
<chr> <chr> <drtn>
1 david A 276 days
2 david M 0 days
3 david S 31 days
4 david NA 89 days
5 jack A 119 days
6 jack M 0 days
7 jack S 154 days
8 jack NA 123 days
9 john A 152 days
10 john M 31 days
11 john S 182 days
12 john NA 31 days
13 william A 151 days
14 william M 31 days
15 william S 214 days
根据评论编辑
df %>%
tidyr::pivot_longer(cols = -name, names_to = 'date') %>%
dplyr::mutate(date = lubridate::dmy(sub('X', '', date))) %>%
dplyr::group_by(name) %>%
dplyr::arrange(name, date) %>%
dplyr::mutate(duration = c(diff(date), 0)) %>%
dplyr::group_by(name, value) %>%
dplyr::filter(dplyr::row_number() < dplyr::last(dplyr::row_number()) | value != 'A') %>%
dplyr::summarise(summed_duration = sum(duration)) %>%
dplyr::filter(value == 'A')
# A tibble: 4 x 3
# Groups: name [4]
name value summed_duration
<chr> <chr> <drtn>
1 david A 245 days
2 jack A 88 days
3 john A 121 days
4 william A 120 days
推荐阅读
- devops - Apache Superset 的警报和报告中没有完整的屏幕截图
- r - 报告满足条件的最后一列
- image - 在 TinyMCE 中上传图像的问题
- git - Git Rebase 在同一主题分支上多次
- machine-learning - 当现实世界中的正例数量较少时平衡分类数据集
- mysql - Mysql唯一键导致死锁
- angularjs - 单元测试用例:没有调用具有功能参数的函数
- asp.net-core - usermanager updateasync 返回内部 500 错误
- postgresql - Hibernate Criteriabuilder Query with part of a complex id
- csv - 如何在 Neo4j 中下载到 CSV