首页 > 解决方案 > 为什么 R 将 0 读取为 NA?

问题描述

我正在为我的数据集 CatalogSales.csv 创建一些关联规则。

我在另一个数据集上使用了相同的代码,没有问题,我发现这个数据集唯一不同的是 R 将 0 值识别为 NA。我可以找到很多关于如何将 NA 值变为 0 的教程,但我不知道为什么这些 0 被识别为 NA 以及修复它们的最佳方法是什么。

这是完整的代码。请注意,我在有关创建条形图的部分中删除了带有 NA 的列。

> cross.sell <- read.csv("CatalogCrossSell.csv")
> head(cross.sell)
  Customer.Number Clothing.Division Housewares.Division
1           11569                 0                   1
2           13714                 0                   1
3           46391                 0                   1
4           67264                 0                   0
5           67363                 0                   0
6           72553                 0                   1
  Health.Products.Division Automotive.Division Personal.Electronics.Division
1                        1                   1                             1
2                        1                   1                             1
3                        1                   1                             1
4                        1                   1                             1
5                        1                   0                             1
6                        1                   1                             1
  Computers.Division Garden.Division Novelty.Gift.Division Jewelry.Division  X
1                  0               0                     1                0 NA
2                  0               1                     1                1 NA
3                  0               1                     1                1 NA
4                  0               1                     1                0 NA
5                  0               1                     1                0 NA
6                  0               1                     1                1 NA
  X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9
1  NA  NA  NA  NA  NA  NA  NA  NA  NA
2  NA  NA  NA  NA  NA  NA  NA  NA  NA
3  NA  NA  NA  NA  NA  NA  NA  NA  NA
4  NA  NA  NA  NA  NA  NA  NA  NA  NA
5  NA  NA  NA  NA  NA  NA  NA  NA  NA
6  NA  NA  NA  NA  NA  NA  NA  NA  NA
> dim(cross.sell)
[1] 60004    20
> nrow(cross.sell)
[1] 60004
> is.data.frame(cross.sell)
[1] TRUE
> table(is.na(cross.sell))

  FALSE    TRUE 
  49980 1150100
> 
> 
> #View individual item names
> t(t(names(cross.sell)))
      [,1]                           
 [1,] "Customer.Number"              
 [2,] "Clothing.Division"            
 [3,] "Housewares.Division"          
 [4,] "Health.Products.Division"     
 [5,] "Automotive.Division"          
 [6,] "Personal.Electronics.Division"
 [7,] "Computers.Division"           
 [8,] "Garden.Division"              
 [9,] "Novelty.Gift.Division"        
[10,] "Jewelry.Division"             
[11,] "X"                            
[12,] "X.1"                          
[13,] "X.2"                          
[14,] "X.3"                          
[15,] "X.4"                          
[16,] "X.5"                          
[17,] "X.6"                          
[18,] "X.7"                          
[19,] "X.8"                          
[20,] "X.9"                          
> 
> #Create bar plot to view distribution of sales and remove NA values
> cross.data <- as.matrix(subset(cross.sell[,-c(1, 11:20)]))
> barplot(cross.data, main = "Individual Item Sales")
> dim(cross.data)
[1] 60004     9
> 
> #Check for NA values and class
> table(is.na(cross.data))

 FALSE   TRUE 
 44982 495054 
> sapply(cross.data, class)
   [1] "integer" "integer" "integer" "integer" "integer" "integer" "integer"
   [8] "integer" "integer" "integer" "integer" "integer" "integer" "integer"
 [ reached getOption("max.print") -- omitted 539036 entries ]
> 
> #Create transaction database
> cross.trans <- as(cross.data, "transactions")
Error in if (any(from != 0 & from != 1)) warning("matrix contains values other than 0 and 1! Setting all entries != 0 to 1.") : 
  missing value where TRUE/FALSE needed

任何指导将不胜感激。因为我已经省略了 11:20 列,所以我不应该有任何 NA 值,但我显然有,当我去超级老学校并将 NA = False 和 NA = True 值的数量相加时,我得到相同数量的值作为我的数据库总数,所以我知道这是因为 R 将 0 读取为 NA,但我不知道为什么

这是所要求的前几行:

> readLines("CatalogCrossSell.csv", n=4)
[1] "Customer Number,Clothing Division,Housewares Division,Health Products Division,Automotive Division,Personal Electronics Division,Computers Division,Garden Division,Novelty Gift Division,Jewelry Division,,,,,,,,,,"
[2] "000011569,0,1,1,1,1,0,0,1,0,,,,,,,,,,"                                                                                                                                                                               
[3] "000013714,0,1,1,1,1,0,1,1,1,,,,,,,,,,"                                                                                                                                                                               
[4] "000046391,0,1,1,1,1,0,1,1,1,,,,,,,,,,"

在创建 cross.data 子集(在箱线图部分)时,我已经尝试删除其他 10 列,但仍然出现错误。为清楚起见,该部分代码如下:

> cross.data <- as.matrix(subset(cross.sell[,-c(1, 11:20)]))

> head(cross.data)
  Clothing.Division Housewares.Division Health.Products.Division
1                 0                   1                        1
2                 0                   1                        1
3                 0                   1                        1
4                 0                   0                        1
5                 0                   0                        1
6                 0                   1                        1
  Automotive.Division Personal.Electronics.Division Computers.Division
1                   1                             1                  0
2                   1                             1                  0
3                   1                             1                  0
4                   1                             1                  0
5                   0                             1                  0
6                   1                             1                  0
  Garden.Division Novelty.Gift.Division Jewelry.Division
1               0                     1                0
2               1                     1                1
3               1                     1                1
4               1                     1                0
5               1                     1                0
6               1                     1                1

感谢您的帮助 - 很抱歉回复晚了,我现在有很多事情要做,这是我第一次有机会检查任何东西。

标签: rna

解决方案


推荐阅读