首页 > 解决方案 > 不同数据集的相同计算

问题描述

我是 R 的初学者并试图解决以下问题。我有 30 个数据集,我需要对其应用相同的计算。数据集包含名称,我必须找到每个数据集中所有列中包含的名称。所有数据集都有 4 列。为简单起见,假设我有以下 3 个数据集:

df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"), 
x2=c("Ben","Paul","Tim", "Linda", "Alex", "MJ"), 
x3=c("Tomas","Alex","Ben", "Paul", "MJ", "Tim"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Ben"))

df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"), 
x2=c("Ben","Paul","Tim", "Linda", "Tyler", "MJ"), 
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))

df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"), 
x2=c("Lisa","Paul","Tim", "Linda", "Tyler", "MJ"), 
x3=c("Tyler","Alex","Ben", "Tyler", "MJ", "Lisa"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))

我的想法是,我首先提取每个数据集中的每个唯一名称(因为它们不同,有时在数据集中出现多次),然后查看这些唯一名称是否包含在每个数据集的每一列中。因此,我已经使用以下方法将所有数据集组合到数据集列表中:

df_list<-list(df1,df2,df3)

然后我使用以下方法提取了每个数据集中的唯一名称:

unique_list <- lapply(df_list,  function(x) {
  as.vector(unique(unlist(x)))
})

这是我卡住的地方。我不知道如何将唯一名称列表与每个数据集的每一列进行比较。我对每个数据集分别执行的方法如下:

u<-as.vector(unique(unlist(df1)))
n<- ifelse(u%in%df1$x1 & u%in%df1$x2 & u%in%df1$x3 & 
               u%in%df1$x4", 1, 0)
Names_1<-cbind(u, n) #values with a 1 are the names included in all columns of dataset

有什么好方法可以一次对所有数据集进行上述计算吗?

提前非常感谢!

标签: rdatasetlapply

解决方案


试试这种方式

library(tidyverse)
library(janitor)
df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Alex"), 
                 x3=c("Tomas","Alex","Ben", "Paul", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Tyler"), 
                 x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Tyler"), 
                 x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df <- bind_cols(df1, df2, df3) %>% clean_names()

uniq_name <- df %>% 
  pivot_longer(everything(), names_to = NULL) %>% 
  distinct() %>% 
  pull()

map(uniq_name, ~ colSums(df == .x) >= 1) %>% 
  map_lgl(all) %>% 
  as_tibble() %>% 
  add_column(uniq_name) %>% 
  filter(value)

# A tibble: 1 x 2
  value uniq_name
  <lgl> <chr>    
1 TRUE  Ben 

推荐阅读