首页 > 解决方案 > 分布在R中的多列

问题描述

我在长格式的 3 个时间点数据中有 6 个基因,我试图用 6 个基因的 6 个列进行传播。总是有这个错误。'您是否需要使用 tibble::rowid_to_column() 创建唯一 ID?打电话rlang::last_error()查看回溯'

fgcrkmtptlog



 -   timepointgene  treatment value           tpt6
   1    24  crk10   treated 1.7883197   24 treated
   2    24  crk10   treated 1.0605152   24 treated
   3    24  crk10   treated 1.0050634   24 treated
   4    24  crk10   treated 1.8876708   24 treated
   5    24  crk10   treated 1.4960427   24 treated
   6    48  crk10   treated 2.4190837   48 treated
   7    48  crk10   treated 2.9805329   48 treated
   8    48  crk10   treated 3.4241471   48 treated
   9    48  crk10   treated 2.3705634   48 treated
   10   48  crk10   treated 2.0378527   48 treated
   11   72  crk10   treated 2.5438502   72 treated
   12   72  crk10   treated 3.7291318   72 treated
   13   72  crk10   treated 2.8419034   72 treated
   14   72  crk10   treated 3.3363484   72 treated
   15   72  crk10   treated 3.2231344   72 treated
   16   24  crk18   treated 2.0620297   24 treated
   17   24  crk18   treated 1.5837581   24 treated
   18   24  crk18   treated 2.1590703   24 treated
   19   24  crk18   treated 2.1706227   24 treated
   20   24  crk18   treated 2.4964019   24 treated
   21   48  crk18   treated 2.6026845   48 treated
   22   48  crk18   treated 2.7898342   48 treated
   23   48  crk18   treated 2.6719992   48 treated
   24   48  crk18   treated 2.7574874   48 treated
   25   48  crk18   treated 3.4852919   48 treated
   26   72  crk18   treated 3.1710652   72 treated
   27   72  crk18   treated 3.3720779   72 treated
   28   72  crk18   treated 1.8194282   72 treated
   29   72  crk18   treated 2.8221811   72 treated
   30   72  crk18   treated 2.8395098   72 treated
   31   24  crk23   treated 0.9164792   24 treated
   32   24  crk23   treated 0.9580680   24 treated
   33   24  crk23   treated 0.5976315   24 treated
   34   24  crk23   treated 1.0597296   24 treated
   35   24  crk23   treated 1.0389352   24 treated
   36   48  crk23   treated 2.1156238   48 treated
   37   48  crk23   treated 2.8226339   48 treated
   38   48  crk23   treated 3.4533979   48 treated
   39   48  crk23   treated 2.7486982   48 treated
   40   48  crk23   treated 2.0324462   48 treated
   41   72  crk23   treated 3.1622761   72 treated
   42   72  crk23   treated 1.7135985   72 treated
   43   72  crk23   treated 2.7186619   72 treated
   44   72  crk23   treated 2.7810451   72 treated
   45   72  crk23   treated 1.4502025   72 treated
   46   24  crk24   treated 0.5338245   24 treated
   47   24  crk24   treated 0.4759149   24 treated
   48   24  crk24   treated 1.1967879   24 treated
   49   24  crk24   treated 1.0627795   24 treated
   50   24  crk24   treated 1.1429535   24 treated
   51   48  crk24   treated 1.4532524   48 treated
   52   48  crk24   treated 2.2573031   48 treated
   53   48  crk24   treated 2.3474122   48 treated
   54   48  crk24   treated 2.2203353   48 treated
   55   48  crk24   treated 2.4594710   48 treated
   56   72  crk24   treated 2.3058234   72 treated
   57   72  crk24   treated 2.4236584   72 treated
   58   72  crk24   treated 2.5484249   72 treated
   59   72  crk24   treated 2.6685704   72 treated
   60   72  crk24   treated 2.0967240   72 treated
   61   24  crk40   treated 1.0119949   24 treated
   62   24  crk40   treated 1.0813096   24 treated
   63   24  crk40   treated 1.7328680   24 treated
   64   24  crk40   treated 1.9962639   24 treated
   65   24  crk40   treated 2.3567004   24 treated
   66   48  crk40   treated 3.5558450   48 treated
   67   48  crk40   treated 2.6131649   48 treated
   68   48  crk40   treated 2.5299872   48 treated
   69   48  crk40   treated 3.4911513   48 treated
   70   48  crk40   treated 3.3247960   48 treated
   71   72  crk40   treated 4.8381673   72 treated
   72   72  crk40   treated 4.9352079   72 treated
   73   72  crk40   treated 4.4292105   72 treated
   74   72  crk40   treated 3.8631403   72 treated
   75   72  crk40   treated 4.0052355   72 treated
   76   24  crk47   treated 0.1378544   24 treated
   77   24  crk47   treated 1.9212654   24 treated
   78   24  crk47   treated 2.3856740   24 treated
   79   24  crk47   treated 1.6301435   24 treated
   80   24  crk47   treated 1.6994583   24 treated
   81   48  crk47   treated 2.8292882   48 treated
   82   48  crk47   treated 2.9817805   48 treated
   83   48  crk47   treated 2.9055344   48 treated
   84   48  crk47   treated 2.9817805   48 treated
   85   48  crk47   treated 3.0199036   48 treated
   86   72  crk47   treated 2.7876993   72 treated
   87   72  crk47   treated 2.9055344   72 treated
   88   72  crk47   treated 3.6472018   72 treated
   89   72  crk47   treated 2.5866866   72 treated
   90   72  crk47   treated 2.6698643   72 treated

我正在尝试将其转换为以基因和时间点为列的数据格式,以及具有三个时间点的六个基因


   fgcrkmtptlog %>% 
     group_by(timepoint) %>% 
     spread(gene, value)

在此处输入图像描述

我想要这张图片的数据

使用后

fgcrkmtptlog %>% 
  rowid_to_column() %>%
  spread(gene, value) 

df 显示很多 NA

1   1   24  treated 24 treated  1.788320    NA  NA  NA  NA  NA
2   2   24  treated 24 treated  1.060515    NA  NA  NA  NA  NA
3   3   24  treated 24 treated  1.005063    NA  NA  NA  NA  NA
4   4   24  treated 24 treated  1.887671    NA  NA  NA  NA  NA
5   5   24  treated 24 treated  1.496043    NA  NA  NA  NA  NA
6   6   48  treated 48 treated  2.419084    NA  NA  NA  NA  NA

标签: r

解决方案


spread需要一个唯一的行 ID,否则它无法工作。如果您的第一列(用作 id)包含重复项,则需要创建一个新的唯一行 ID。

您发布的错误消息正是如此,因此将以下内容添加到您的代码中:

fgcrkmtptlog %>% 
    # group_by(timepoint) %>% I took this out because group_by should be unnecessary here
     rowid_to_column() %>%
     spread(gene, value)

这将解决您当前的错误。

编辑:

根据您的数据,spread 可能会引入 NA,这是一个示例:

# Produce sample data
df <- structure(list(Year = c("2014", "2014", "2014", "2014", "2015", 
"2015", "2015", "2015", "2016"), Month = c("01", "06", "07", 
"12", "01", "06", "07", "12", "01"), Day = c("01", "01", "01", 
"01", "01", "01", "01", "01", "01"), test = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    Halfyear = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L
    ), .Label = c("2014 First Half", "2015 First Half", "2016 First Half"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-9L))

# Your code
df <- data.frame(years,test)
  df %>%
    rowid_to_column() %>%
    spread(Month,test)

如果您对此进行测试,您将看到spread正确引入NAs,因为有些Months没有test值。由于 spread 在我的数据中每个现有月份创建一列,因此它还必须显示 NA ,其中不存在月份和测试的先前组合。

在传播之前,您有一个稀疏数据集,仅显示实际存在的数据,但传播完成了数据集以使其变宽。


推荐阅读