首页 > 解决方案 > 在 R 中使用 sort() 或 order() 对因子进行排序

问题描述

我正在尝试根据一列对我的数据框进行排序。我的数据框结构是:

data.frame':    9194 obs. of  7 variables:
 $ taxonomy_y: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ otu1id    : Factor w/ 51 levels "_1","_10","_102",..: 12 12 12 12 12 12 12 12 12 12 ...
 $ taxonomy_x: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 45 50 42 24 17 14 2 7 39 44 ...
 $ otu2id    : Factor w/ 51 levels "_1","_10","_102",..: 23 41 26 51 2 10 25 35 42 5 ...
 $ otu2      : chr  "333" "241" "14" "56" ...
 $ otu1      : chr  "16" "119" "90" "16" ...
 $ CONTROL1  : num  0.0897 0.0864 0.2444 0.1818 0.5976 ...

我的数据框看起来像:

     taxonomy_y otu1id  taxonomy_x   otu2id otu2 otu1   
 1  Alistipes    _14    Roseburia      _29  333  16   
 2  Alistipes    _14    Turicibacter   _63  241  119 
 3  Alistipes    _14    Parasutterella _37  14   90 
 4  Alistipes    _14    Dorea          _98  56   16 
 5  Alistipes    _14    Clostridium    _10  178  16 
 6  Alistipes    _14    Clostridium S  _12  155  16 

我尝试了 column1id 的 sort() 和 order() 函数,但排序不正确,如下所示:(请关注 otuid 列)。

    taxonomy_y otu1id  taxonomy_x   otu2id otu2 otu1   
 1  Alistipes    _1    Roseburia      _29  333  16   
 2  Alistipes    _1    Turicibacter   _63  241  119 
 3  Alistipes    _1    Parasutterella _37  14   90 
 4  Alistipes    _10   Dorea          _98  56   16 
 5  Alistipes    _10   Clostridium    _10  178  16 
 6  Alistipes    _10   Clostridium    _12  155  16
 7  Alistipes    _100  Clostridium    _12  155  16
 8  Alistipes    _1008 Clostridium    _12  155  16
 9  Alistipes    _2    Clostridium    _12  155  16
 10 Alistipes    _23    Clostridium S  _12  155  16

为什么我在 _2 之前得到 _10?我需要像_1,_2,_3_4 ...._1008这样的排序顺序。我如何实现这一目标?我正在使用 ubundu 操作系统

标签: rsorting

解决方案


由于otu1id列是factor您不能直接订购它们。

例如,观察数据的级别。

factor(as.character(1:10))
# [1] 1  2  3  4  5  6  7  8  9  10
#Levels: 1 10 2 3 4 5 6 7 8 9

我们可以去掉"_"字符串开头的order

df[order(as.numeric(sub("_", "", df$otu1id))), ]
#OR
#df[order(as.numeric(sub("\\D", "", df$otu1id))), ]

#   taxonomy_y otu1id     taxonomy_x otu2id otu2 otu1
#1   Alistipes     _1      Roseburia    _29  333   16
#2   Alistipes     _1   Turicibacter    _63  241  119
#3   Alistipes     _1 Parasutterella    _37   14   90
#9   Alistipes     _2    Clostridium    _12  155   16
#4   Alistipes    _10          Dorea    _98   56   16
#5   Alistipes    _10    Clostridium    _10  178   16
#6   Alistipes    _10    Clostridium    _12  155   16
#10  Alistipes    _23   ClostridiumS    _12  155   16
#7   Alistipes   _100    Clostridium    _12  155   16
#8   Alistipes  _1008    Clostridium    _12  155   16

如果您转换otu1id为字符,您可以直接使用mixedorderfromgtools

df[gtools::mixedorder(as.character(df$otu1id)), ]

数据

df <- structure(list(taxonomy_y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "Alistipes", class = "factor"), otu1id = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), .Label = c("_1", "_10", 
"_100", "_1008", "_2", "_23"), class = "factor"), taxonomy_x = structure(c(5L, 
6L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("Clostridium", 
"ClostridiumS", "Dorea", "Parasutterella", "Roseburia", "Turicibacter"
), class = "factor"), otu2id = structure(c(3L, 5L, 4L, 6L, 1L, 
2L, 2L, 2L, 2L, 2L), .Label = c("_10", "_12", "_29", "_37", "_63", 
"_98"), class = "factor"), otu2 = c(333L, 241L, 14L, 56L, 178L, 
155L, 155L, 155L, 155L, 155L), otu1 = c(16L, 119L, 90L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

推荐阅读