首页 > 解决方案 > 如何使用 r 计算每行中的字母数?

问题描述

我想计算每行中有多少特定字母。

countedAA <- data.frame (c('A count','C count','D count','E
    count','F count','G count','H count','I count','K count','L
    count','M count','N count','P count','Q count','R count','S
    count','T count','V count','W count','Y count'))

file <- data.frame (c('A,'V','S','A','V'),
                    c('S','K','I','C','A'),
                    c('D','G','R','S','W'))

例如,第一个示例的预期结果:

'A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y
 2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  2,  0,  0
    

这是我尝试过的:

count_aa1 <- file %>%
            rowSums() ~
                               c(sum('A'),
                                 sum('C'),
                                 sum('D'),
                                 sum('E'),
                                 sum('F'),
                                 sum('G'),
                                 sum('H'),
                                 sum('I'),
                                 sum('K'),
                                 sum('L'),
                                 sum('M'),
                                 sum('N'),
                                 sum('P'),
                                 sum('Q'),
                                 sum('R'),
                                 sum('S'),
                                 sum('T'),
                                 sum('V'),
                                 sum('W'),
                                 sum('Y'))
          view(count_aa1)
          results <- cbind(countedAA, count_aa1) 
          results

我收到此错误:

Error in as.data.frame.default(x) : 
  cannot coerce class ‘&quot;formula"’ to a data.frame

 

我将不胜感激您的建议!

标签: rdataframecount

解决方案


您的输入数据存在很多问题,目前尚不清楚您拥有的数据的确切结构是什么。我已经修复了其中的一些以在这里提供答案。

您可以通过使用删除其中的附加文本来收集您想要计算的所有唯一值gsub。用于table计算每个字母的频率。

unique_values <- gsub('count|\\s', '', countedAA$a)
unique_values
#[1] "A" "C" "D" "E" "F" "G" "H" "I" "K" "L" "M" "N" "P" "Q" "R" "S" "T" "V" "W" "Y"

apply(file, 2, function(x) table(factor(x, levels = unique_values)))

#  a b c
#A 2 1 0
#C 0 1 0
#D 0 0 1
#E 0 0 0
#F 0 0 0
#G 0 0 1
#H 0 0 0
#I 0 1 0
#K 0 1 0
#L 0 0 0
#M 0 0 0
#N 0 0 0
#P 0 0 0
#Q 0 0 0
#R 0 0 1
#S 1 1 1
#T 0 0 0
#V 2 0 0
#W 0 0 1
#Y 0 0 0

您的数据设置方式似乎您想计算每列(而不是行)中的频率,因此,我使用applywith margin = 2。如果你想为每一行使用 apply with margin = 1

数据

countedAA <- data.frame (a = c('A count','C count','D count','E count','F count',
'G count','H count','I count','K count','L count','M count','N count','P count','Q count',
'R count','S  count','T count','V count','W count','Y count'))

file <- data.frame(a = c('A','V','S','A','V'),
                   b = c('S','K','I','C','A'),
                   c =  c('D','G','R','S','W'))

推荐阅读