r - Concatenate and count the number of elements in each row?
问题描述
df <- structure(list(ID = c("1", "2", "3", "4", "5", "6"), `ID without mask` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `Other Years` = c("2011", "2015", "2015", "2006, 2006, 2005, 2005, 2007",
"2014, 2011", "2007"), `Cut off Year` = c("2011", "2015", "2015", "2005",
"2011", "2007"), `2005` = c(NA, NA, NA, "30", "18", NA), `2006` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `2007` = c("15", NA, "18", NA, "30, 18", NA), `2008` = c("16",
NA, NA, "30, 27", "18, 30", NA), `2009` = c("15", NA, NA, "20",
"30, 18", NA), `2010` = c(NA, NA, NA, "30, 20", NA, NA), `2011` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `2012` = c(NA, NA, NA, "20, 30", NA, "26"), `2013` = c("15",
NA, "19", NA, NA, NA), `2014` = c(NA, NA, "18", NA, NA, NA),
`2015` = c(NA, NA, "18", NA, "18", NA), `2016` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), .Names = c("ID", "ID without mask",
"Other Years", "Cut off Year", "2005", "2006", "2007", "2008",
"2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016"
), row.names = c(NA, 6L), class = "data.frame")
Given the above data frame. I would like R to concatenate and count every element (for the years) in a row together and then output it in a new column.
Based on the "Cut off Year" column, I want R to concatenate the elements before the Cut off Year together in one column and the elements after the cut off year together in a second column (cut off year included).
So for the first row with a cut off year of 2011, the years 2007, 2008, and 2009 has 15, 16, 15 respectively, so that is 3 in total, so R should output the number 3 in the new column. After 2011, only 2013 has an element with a entry, hence the "after" column will only have the number 1.
Elements such as "30, 27" count as two, etc.
Here's the desired output:
structure(list(ID = c("1", "2", "3", "4", "5", "6"), `ID without mask` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `Other Years` = c("2011", "2015", "2015", "2006, 2006, 2005, 2005, 2007",
"2014, 2011", "2007"), `Cut off Year` = c("2011", "2015", "2015", "2005",
"2011", "2007"), `2005` = c(NA, NA, NA, "30", "18", NA), `2006` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `2007` = c("15", NA, "18", NA, "30, 18", NA), `2008` = c("16",
NA, NA, "30, 27", "18, 30", NA), `2009` = c("15", NA, NA, "20",
"30, 18", NA), `2010` = c(NA, NA, NA, "30, 20", NA, NA), `2011` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `2012` = c(NA, NA, NA, "20, 30", NA, "26"), `2013` = c("15",
NA, "19", NA, NA, NA), `2014` = c(NA, NA, "18", NA, NA, NA),
`2015` = c(NA, NA, "18", NA, "18", NA), `2016` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), `Before cut` = c("3", "0", "3", "0", "7", "0"), `After cut` = c("1", "0", "1", "8", "1", "1")), .Names = c("ID", "Collab Years Patents",
"Collab Years Publications", "Cut off Year", "2005", "2006", "2007", "2008",
"2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "Before cut", "After cut"
), row.names = c(NA, 6L), class = "data.frame")
解决方案
我们转换为一个不错的长格式,计算值,然后加入原始格式:
library(tidyr)
library(dplyr)
library(stringr)
gather(df, key = "Year", value = "value", `2005`:`2016`) %>%
mutate(val_count = str_count(value, pattern = ",") + 1) %>%
group_by(ID) %>%
summarize(Before = sum(val_count * (Year < `Cut off Year`), na.rm = TRUE),
After = sum(val_count * (Year >= `Cut off Year`), na.rm = TRUE)) %>%
right_join(df) %>%
select(1:3)
# Joining, by = "ID"
# A tibble: 6 x 3
ID Before After
<chr> <dbl> <dbl>
1 1 3 1
2 2 0 0
3 3 3 1
4 4 0 8
5 5 7 1
6 6 0 1
我select(1:3)
过去只显示结果的相关部分 - 省略该行以获取所有其他列。为了获得值的数量,我们将逗号的数量加 1。
推荐阅读
- javascript - 路由器出口外的角子路由元素
- mysql - 如何插入从节点快递到 MySQL db 的多个列
- sapb1 - 如何将销售订单数据插入到用户定义的表中
- mockito - Mockito.spy 执行不同的代码库
- excel - Excel - 生成显示满足多种形式标准的单元格的列表
- javascript - 为什么我的子菜单与另一个子元素重叠?
- python - 有没有更简单的方法来制作 4 行 15 个精灵?
- python - 在 PySpark 中 Group By 之后从可用行创建 JSON 结构
- reactjs - 如何在使用下一个路由器时解决未处理的运行时错误
- node.js - Firestore Timestamp 变量抛出 TypeError: startDate.toMillis is not a function