首页 > 解决方案 > 删除r中每一行中相似的国家名称

问题描述

我有一个数据集,其中有一个示例列,如下所示。

我需要在每一行中删除相似的国家名称(主要请求)

然后我需要为每个国家创建一个列(补充请求)。

data<-read.table(text="
LocationCountry
United States, Belgium, France, Ireland, Netherlands, Netherlands, Netherlands, Sweden
Spain, Spain, Spain, Spain
Korea, Republic of
Korea, Republic of
Austria, Austria, Austria
United States, United States, United States, United States, United States, United States
Italy, Italy
Korea, Republic of, Korea, Republic of, Korea, Republic of, Korea, Republic of, Korea, Republic of, Korea, Republic of, Korea, Republic of, Korea, Republic of
India, Iran, Islamic Republic of
Spain, Spain, Spain, Spain
Korea, Republic of
Turkey, Turkey", header=T, sep="\n")

任何建议将不胜感激

标签: rduplicates

解决方案


base R中,我们可以使用 strsplitto split into a list,获取unique元素并将paste它们返回

data$LocationCountry <- sapply(strsplit(data$LocationCountry, ",\\s*"), 
       function(x) toString(unique(x)))

-输出

data
#                                                LocationCountry
#1  United States, Belgium, France, Ireland, Netherlands, Sweden
#2                                                         Spain
#3                                            Korea, Republic of
#4                                            Korea, Republic of
#5                                                       Austria
#6                                                 United States
#7                                                         Italy
#8                                            Korea, Republic of
#9                              India, Iran, Islamic Republic of
#10                                                        Spain
#11                                           Korea, Republic of
#12                                                       Turkey

对于补充部分,如果我们需要为“LocationCountry”中的每个元素创建二进制列,则使用更新后的具有唯一名称的“LocationCountry”列,将其拆分,然后应用mtabulate

library(qdapTools)
cbind(data, mtabulate(strsplit(data$LocationCountry, ",\\s+")))

-输出

                                             LocationCountry Austria Belgium France India Iran Ireland Islamic Republic of Italy
1  United States, Belgium, France, Ireland, Netherlands, Sweden       0       1      1     0    0       1                   0     0
2                                                         Spain       0       0      0     0    0       0                   0     0
3                                            Korea, Republic of       0       0      0     0    0       0                   0     0
4                                            Korea, Republic of       0       0      0     0    0       0                   0     0
5                                                       Austria       1       0      0     0    0       0                   0     0
6                                                 United States       0       0      0     0    0       0                   0     0
7                                                         Italy       0       0      0     0    0       0                   0     1
8                                            Korea, Republic of       0       0      0     0    0       0                   0     0
9                              India, Iran, Islamic Republic of       0       0      0     1    1       0                   1     0
10                                                        Spain       0       0      0     0    0       0                   0     0
11                                           Korea, Republic of       0       0      0     0    0       0                   0     0
12                                                       Turkey       0       0      0     0    0       0                   0     0
   Korea Netherlands Republic of Spain Sweden Turkey United States
1      0           1           0     0      1      0             1
2      0           0           0     1      0      0             0
3      1           0           1     0      0      0             0
4      1           0           1     0      0      0             0
5      0           0           0     0      0      0             0
6      0           0           0     0      0      0             1
7      0           0           0     0      0      0             0
8      1           0           1     0      0      0             0
9      0           0           0     0      0      0             0
10     0           0           0     1      0      0             0
11     1           0           1     0      0      0             0
12     0           0           0     0      0      1             0

推荐阅读