r - 比较两个相似元数据的名称和选项
问题描述
我有两个类似的元数据大约。1000 条记录和 500 多列。我想检查两个数据框之间的一致性。现在我想创建一个新的数据框,它将在第一行中显示 df1 的所有列名,并与第二行中的 df2 列名相同,并且分别在第 3 列和第 4 列中显示它们的选项。然后改变新列以显示TRUE
或FALSE
列名及其选项是否匹配。
基本上我必须检查 df1 中的列名是否与 df 2 完全一致,以及 df1 的所有列中的选项是否与 df2 完全匹配
df1 <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921",NA,"KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
city=c("del","mum","nav","pun","bang","chen","triv","vish","del","mum","bang","vish","bhop","kol","noi","gurg"),
Name= c("dev,akash","singh,rahul","abbas,salman","lal,ram","singh,nkunj","garg,prabal","ali,sanu","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","pawar,pooja","sharma,nikita"),
gender =c("m","f","m","f","m","m","m","m","m","m","m","f","m","f","m","m"))
df2 <- data.frame(ID =c("DEV2962","KTN2152","ANA2719","ITs2624","DEV2698","HRT2921",NA,"KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
city=c("del","MUm"," nav","pun","bang","chen"," ddgy ","vish","del","mum","bang","vish","bhol","nhus","huay","gurg"),
Name= c("dev","singh,rahul","abbas,salman","lal,ram","singh,nkunj","huna,ghalak","khan,fhalt","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","pawar,pooja","sharma,nikita"),
gender =c("m","f","m","f","m","m","m","male","m","male","m","f","m","f","m","m"))
df <- select(df1,matches("^[A-Z]"))
comp <- do.call(rbind, lapply(df[, 4:ncol(df)], function(option) as.data.frame(table(option))))
comp$variable <- gsub("[.](.*)","", rownames(comp))
rownames(comp) <- NULL
comp <- comp[, c(3,1,2)]
comp <- comp[order(-comp$Freq), ]
我无法附加该文件,但我已在所需输出下方显示了图片。
解决方案
一个类似于 Tarjae 的 tidyverse 选项。
library(dplyr)
library(tidyr)
one <- pivot_longer(df1, everything()) %>%
rename(names_df1 = name, options_df1 = value)
two <- pivot_longer(df2, everything()) %>%
rename(names_df2 = name, options_df2 = value)
one %>%
bind_cols(two) %>%
mutate(names_matching = names_df1 == names_df2,
options_matching = options_df1 == options_df2) %>%
arrange(names_df1, names_df2)
# # A tibble: 64 x 6
# names_df1 options_df1 names_df2 options_df2 names_matching options_matching
# <chr> <chr> <chr> <chr> <lgl> <lgl>
# 1 city del city "del" TRUE TRUE
# 2 city mum city "MUm" TRUE FALSE
# 3 city nav city " nav" TRUE FALSE
# 4 city pun city "pun" TRUE TRUE
# 5 city bang city "bang" TRUE TRUE
# 6 city chen city "chen" TRUE TRUE
# 7 city triv city " ddgy " TRUE FALSE
# 8 city vish city "vish" TRUE TRUE
# 9 city del city "del" TRUE TRUE
推荐阅读
- vue.js - 动态地将新字段添加到 b-table [Vue.js]
- crop - 使用TextureView播放视频,如何使用Matrix裁剪指定RectF的视频?
- mysql - 如何删除标识符中的前导零,例如“1.2.03.4”
- php - 不断运行 Php 脚本,如果作为 Windows 服务连接到互联网,则将数据插入 xampp 服务器
- c++ - 无法加载犰狳立方体
使用 RcppArmadillo 时 - python - 如何从字符串变量中删除值
- c - 在C中取N个带空格的输入字符串
- django - 如何在 miniconda 中安装 django?
- html - 如何删除不需要的空格
- matlab - 向量中fft点和fft之间的差异