r - 不同数据集的相同计算
问题描述
我是 R 的初学者并试图解决以下问题。我有 30 个数据集,我需要对其应用相同的计算。数据集包含名称,我必须找到每个数据集中所有列中包含的名称。所有数据集都有 4 列。为简单起见,假设我有以下 3 个数据集:
df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Alex", "MJ"),
x3=c("Tomas","Alex","Ben", "Paul", "MJ", "Tim"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Ben"))
df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Tyler", "MJ"),
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))
df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"),
x2=c("Lisa","Paul","Tim", "Linda", "Tyler", "MJ"),
x3=c("Tyler","Alex","Ben", "Tyler", "MJ", "Lisa"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))
我的想法是,我首先提取每个数据集中的每个唯一名称(因为它们不同,有时在数据集中出现多次),然后查看这些唯一名称是否包含在每个数据集的每一列中。因此,我已经使用以下方法将所有数据集组合到数据集列表中:
df_list<-list(df1,df2,df3)
然后我使用以下方法提取了每个数据集中的唯一名称:
unique_list <- lapply(df_list, function(x) {
as.vector(unique(unlist(x)))
})
这是我卡住的地方。我不知道如何将唯一名称列表与每个数据集的每一列进行比较。我对每个数据集分别执行的方法如下:
u<-as.vector(unique(unlist(df1)))
n<- ifelse(u%in%df1$x1 & u%in%df1$x2 & u%in%df1$x3 &
u%in%df1$x4", 1, 0)
Names_1<-cbind(u, n) #values with a 1 are the names included in all columns of dataset
有什么好方法可以一次对所有数据集进行上述计算吗?
提前非常感谢!
解决方案
试试这种方式
library(tidyverse)
library(janitor)
df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Alex"),
x3=c("Tomas","Alex","Ben", "Paul", "MJ"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ"))
df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Tyler"),
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ"))
df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Tyler"),
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ"))
df <- bind_cols(df1, df2, df3) %>% clean_names()
uniq_name <- df %>%
pivot_longer(everything(), names_to = NULL) %>%
distinct() %>%
pull()
map(uniq_name, ~ colSums(df == .x) >= 1) %>%
map_lgl(all) %>%
as_tibble() %>%
add_column(uniq_name) %>%
filter(value)
# A tibble: 1 x 2
value uniq_name
<lgl> <chr>
1 TRUE Ben
推荐阅读
- c++ - 模板模板可变参数包
- javascript - 如何在vscode扩展开发中获取triggerSuggest选择的值
- javascript - 开发服务器返回响应错误码:500(React Native)
- node.js - 未捕获的错误:ENOENT:没有这样的文件或目录,打开“/etc/resolv.conf”
- mongodb - 将我的行中带有逗号的 CSV 文件导入 mongoDB
- python - 没有括号的numpy数组
- spring - 如何为集群 MQTT 代理设置负载均衡器
- database - 具有动态表和列的 Spring Boot JPA
- mysql - 如何在多个数据库中将所有表引擎从 MISAM 更改为 INNODB?
- javafx - 使用 ArcGIS Java SDK 绘制圆形/矩形形状