首页 > 解决方案 > R - 根据另一个数据框的多列中的值之间的关系创建具有值的矩阵

问题描述

我有一些数据:

df <- data.frame(first = c('response','new','NA','early','archive','archive','early','dormant','dormant','response'),
                second = c('response','NA','new','response','response','NA','response','new','dormant','dormant'),
                 third = c('dormant','response','early','response','NA','archive','response','archive','new','new'),
                fourth = c('dormant','NA','archive','early','new','archive','NA','new','early','response'),
                 fifth = c('archive','archive','NA','new','new','response','dormant','new','new','dormant'),
                 sixth = c('response','response','new','archive','NA','early','new','dormant','NA','dormant'),
                seventh = c('new','NA','archive','new','dormant','dormant','NA','NA','NA','new'))

看起来像这样:

      first   second    third   fourth    fifth    sixth seventh
1  response response  dormant  dormant  archive response     new
2       new       NA response       NA  archive response      NA
3        NA      new    early  archive       NA      new archive
4     early response response    early      new  archive     new
5   archive response       NA      new      new       NA dormant
6   archive       NA  archive  archive response    early dormant
7     early response response       NA  dormant      new      NA
8   dormant      new  archive      new      new  dormant      NA
9   dormant  dormant      new    early      new       NA      NA
10 response  dormant      new response  dormant  dormant     new

考虑到某些条件,我需要根据列与前一列之间的关系返回一个显示 1 或 0 的矩阵。

当一列包含“response”、“new”或“early”中的任何一个前一列包含“response”、“new”或“early”以外的任何内容时,矩阵必须包含值1 ,否则返回0。

我希望这样的事情:

   first second third fourth fifth sixth seventh
1      0      0     0      0     0     1       0
2      0      0     1      0     0     1       0
3      0      1     0      0     0     1       0
4      0      0     0      0     0     0       1
5      0      1     0      1     0     0       0
6      0      0     0      0     1     0       0
7      0      0     0      0     0     1       0
8      0      1     0      1     0     0       0
9      0      0     1      0     0     0       0
10     0      0     1      0     0     0       1

我希望第一列只包含 0,因为没有前一列可以比较。

任何帮助将不胜感激。

标签: r

解决方案


df2 = replace(df, is.na(df), "NA_chr")
m = Reduce("|", lapply(c("response", "new", "early"), function(x) df2[,-1] == x)) * 
    Reduce("&", lapply(c("response", "new", "early"), function(x) df2[,-NCOL(df)] != x))
m = cbind(rep(0, NROW(m)), m)
m
#        second third fourth fifth sixth seventh
# [1,] 0      0     0      0     0     1       0
# [2,] 0      0     1      0     0     1       0
# [3,] 0      1     0      0     0     1       0
# [4,] 0      0     0      0     0     0       1
# [5,] 0      1     0      1     0     0       0
# [6,] 0      0     0      0     1     0       0
# [7,] 0      0     0      0     0     1       0
# [8,] 0      1     0      1     0     0       0
# [9,] 0      0     1      0     0     0       0
#[10,] 0      0     1      0     0     0       1

推荐阅读