首页 > 解决方案 > R用NA折叠数据中的行(不是数字)

问题描述

我有一组看起来像这样的数据:

+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| ID1 | ID2 |    ID3    | Source1 | Source2 | Source3 | Source4 | Source5 | Source6 |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| A   |   1 | August    | q3      | NA      | NA      | NA      | NA      | NA      |
| A   |   1 | August    | NA      | q1      | NA      | NA      | NA      | NA      |
| A   |   1 | August    | NA      | NA      | q2      | NA      | q2      | NA      |
| B   |   2 | September | q2      | NA      | NA      | NA      | NA      | NA      |
| B   |   2 | September | NA      | q4      | NA      | NA      | NA      | NA      |
| B   |   2 | September | NA      | NA      | q1      | NA      | NA      | NA      |
| B   |   2 | September | NA      | NA      | NA      | q1      | NA      | NA      |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+

我想把它折叠成这样:

+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| ID1 | ID2 |    ID3    | Source1 | Source2 | Source3 | Source4 | Source5 | Source6 |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| A   |   1 | August    | q3      | q1      | q2      | NA      | q2      | NA      |
| B   |   2 | September | q2      | q4      | q1      | q1      | NA      | NA      |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+

我尝试了以下方法:

temp = aggregate(.~ ID1 + ID2 + ID3, data = temp, FUN = na.omit, na.action = 'na.pass' )

但是输出不正确(我猜如果源字段是数字的,这种方法会起作用)。知道如何完成我的目标吗?

标签: rdataframedatatable

解决方案


na.omit也应该使用字符值。你NA可以这样做:

result <- aggregate(.~ ID1 + ID2 + ID3, temp, function(x) na.omit(x)[1], 
                    na.action = 'na.pass')
result

#  ID1 ID2       ID3 Source1 Source2 Source3 Source4 Source5 Source6
#1   A   1    August      q3      q1      q2    <NA>      q2    <NA>
#2   B   2 September      q2      q4      q1      q1    <NA>    <NA>

或与dplyr

library(dplyr)

temp %>%
  group_by(ID1, ID2, ID3) %>%
  summarise(across(.fns = ~na.omit(.)[1]))

推荐阅读