r - 检查数据框中的特定行组中是否存在值
问题描述
我有这个数据框(df
):
structure(list(from = c("(192) 242-2345", NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "(832) 345-3168",
NA, NA), to = c("(900) 301-3451", NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "(900) 234-1231",
NA, NA), action_result = c("Voicemail", "No Answer", "No Answer",
"No Answer", "No Answer", "No Answer", "No Answer", "No Answer",
"No Answer", "IP Phone Offline", "No Answer", "No Answer", "Busy",
"Busy", "No Answer", "No Answer", "No Answer", "No Answer", "No Answer",
"No Answer", "No Answer", "Busy", "IP Phone Offline", "Busy",
"No Answer", "No Answer", "No Answer", "No Answer", "No Answer",
"IP Phone Offline", "IP Phone Offline", "No Answer", "No Answer",
"IP Phone Offline", "No Answer", "No Answer", "Busy", "Missed",
"Hang Up", "Hang Up")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -40L))
第一行包含拨打电话和接听电话的电话号码。然后,第一行之后的行都是 NA 值。所以第 1-37 行被认为是一组,然后第 38 到 40 行被认为是第二组。我想检查每个组是否包含列中的Call Connected
值action_result
。
我已经尝试group_by
了from
andto
值,但是我正在使用的整个数据集都有重复的from
andto
值对,所以这不起作用。我想要一个dplyr
解决方案来检查前 37 行是否包含Call Connected
并输出带有列的数据框:
from
, to
,CallConnected
其中CallConnected
1 表示是,0 表示否。
S0,看着df
,结果数据集将有 2 行:
from | to | CallConnected
--------------------------------------------------
(192) 242-2345|(900) 301-3451| 0
(832) 345-3168|(900) 234-1231| 0
解决方案
使用tidyverse
包的解决方案,或者您可以只加载dplyr
和tidyr
包来实现此目的。
这个想法是在and列中填充NA
最接近的非 NA 值。之后,用来检查是否有项目匹配, group by和,with来统计总匹配记录。from
to
action_result == "CallConnected"
"CallConnected"
from
to
summarize
sum
library(tidyverse)
df2 <- df %>%
fill(from) %>%
fill(to) %>%
mutate(CallConnected = action_result == "CallConnected") %>%
group_by(from, to) %>%
summarize(CallConnected = sum(CallConnected)) %>%
ungroup()
df2
# # A tibble: 2 x 3
# from to CallConnected
# <chr> <chr> <int>
# 1 (192) 242-2345 (900) 301-3451 0
# 2 (832) 345-3168 (900) 234-1231 0
更新
如果重复是一个问题,我们可以使用rleid
from data.table
package 在fill
函数之后创建 ID。下面是一个例子。
library(tidyverse)
library(data.table)
# Create an example with duplication
df_dup <- bind_rows(df, df %>% slice(1:5))
df_dup2 <- df_dup %>%
fill(from) %>%
fill(to) %>%
mutate(ID = rleid(from, to)) %>%
mutate(CallConnected = action_result == "CallConnected") %>%
group_by(ID, from, to) %>%
summarize(CallConnected = sum(CallConnected)) %>%
ungroup()
df_dup2
# # A tibble: 3 x 4
# ID from to CallConnected
# <int> <chr> <chr> <int>
# 1 1 (192) 242-2345 (900) 301-3451 0
# 2 2 (832) 345-3168 (900) 234-1231 0
# 3 3 (192) 242-2345 (900) 301-3451 0
推荐阅读
- flutter - 如果没有互联网连接,则启动画面
- parsing - 如何使用 ansible 从显示版本(cisco ios)解析元数据
- python - Python 脚本在 VSC 中有效,但在手动启动时无效
- r - R中的优化,对优化参数的总和和类型有约束
- python - 在python的for循环中关闭图形
- python - PyCharm 样式的文档字符串,包含带有 `>>>` 的代码
- node.js - 在 typescript 节点项目中使用 monorepo 样式测试目录时,ts-jest 找不到类型信息
- windows - 线程的上下文保存在哪里,是否可以通过编程方式访问(无需修改内核)?
- printing - 带有 Zebra ZM600 的 Ghostscript
- debugging - Firefox 调试,从智能手机上的台式电脑打开 localhost