r - 变量的R条件计算
问题描述
我正在尝试根据两个不同的列表对象为我的数据框中的每一行创建一个条件总和。这个想法是检查行值/值(其中一些是列表对象)并为每个值/值分配 1 或 -1 并找到它/它们的总数。
例如; df 第 7 行有c("cuneytozdemir", "ahmethc", "fatihportakal")
这样的情况,因此该行的边值应该是 1cuneytozdemir: 1, ahmethc: -1, fatihportakal:1
我已经尝试过这段代码,我只得到了单个值的结果,我没有得到任何列表对象的值。
df_side <- df %>%
mutate(side= case_when(
mentions %in% unlist(J_Con) ~ 1,
mentions %in% unlist(J_Pro) ~ -1))
此处的数据框示例:
df <- structure(list(screen_name = c("N_bingol", "SAliakman", "TC_SelimCandas",
"deniyeciz", "batuozcn", "dst_cemal", "kentineiyibak", "GokhanMHP",
"GokhanMHP", "ramadanmeletli"),
mentions = list(c("turgayguler","mecertas", "yasingorgulu28", "Atasehirakgenc", "SAliakman"),
c("N_bingol", "turgayguler", "mecertas", "yasingorgulu28","Atasehirakgenc", "SAliakman"),
c("MesutKokten", "ahmethc","MHP_Bilgi"), "ahmethc",
c("tevfik_uygur", "turgayguler"),"hikmetgenc",
c("cuneytozdemir", "ahmethc", "fatihportakal"), "ahmethc", "ahmethc",
c("GokhanMHP", "ahmethc"))),
row.names = c("11355","11383", "68592", "228341", "354369", "358484", "359043", "371744","371753", "372216"),
class = "data.frame")
J_Con 和 J_Pro 在这里:
J_Con <- structure(c(17L, 26L, 31L, 38L, 61L, 10L, 68L, 13L, 24L, 41L,
48L, 56L, 49L, 22L, 25L, 52L, 39L, 5L, 9L, 33L, 69L, 73L, 66L,
28L, 55L, 8L, 1L, 74L, 47L, 70L, 20L, 4L, 32L, 16L, 65L, 6L,
14L, 21L, 63L, 43L, 53L, 57L, 15L, 36L, 7L, 37L, 50L, 54L, 12L),
.Label = c("acikcenk", "ahmetay_", "ahmethc", "akdemircigdem",
"amberinzaman", "arzuyldzz", "asliaydintasbas", "AtillaSertell",
"banuguven", "barisyarkadas", "bekiservet", "cakir_rusen", "candundaradasi",
"cengizalgan", "ceydak", "cuneytozdemir", "degirmencirfan", "denizyildirim79",
"doganburak29", "dunya20101", "efekerem", "emrkongar", "EremSenturk",
"erenerdemnet", "ETemelkuran", "fatihportakal", "fatihtezcan",
"gokhanozbek", "haciykk", "hakanchelik", "Halitisci", "HasanCucuk",
"haykobagdat", "hikmetgenc", "hilal_kaplan", "imambakirukus",
"iremafsin", "ismaildukel", "ismailsaymaz", "kenan_kiran", "KucukkayaIsmail",
"mahmutovur", "MaliGuller", "Malikejder47", "MehToprak", "melihaltinok",
"merdanyanardag", "mkirikkanat", "mustafabalbay", "mustafahos",
"nevzatcicek", "nihatsirdar", "NuranWolf", "ozge_mumcu", "ozgurmumcu",
"RuhatMengi34", "s_hablemitoglu", "samiltayyar27", "Sarikli_Voyvoda",
"sarseven", "SedefKabas", "sevilayyaziyor", "siring", "slymnoz",
"TaylanKulacogIu", "Tuluhantekeli", "turgayguler", "ugurdundarsozcu",
"unsalunlu", "utkucakirozer", "veyisates", "yenisafakwriter",
"yvzah", "YZGLLDGN", "zihnicakir"), class = "factor")
J_Pro <- structure(c(67L, 27L, 11L, 40L, 75L, 34L, 3L, 44L, 64L, 72L,
35L, 58L, 60L, 71L, 18L, 62L, 19L, 46L, 51L, 2L, 23L, 29L, 30L,
42L, 59L, 45L),
.Label = c("acikcenk", "ahmetay_", "ahmethc","akdemircigdem", "amberinzaman",
"arzuyldzz", "asliaydintasbas","AtillaSertell", "banuguven", "barisyarkadas",
"bekiservet","cakir_rusen", "candundaradasi", "cengizalgan", "ceydak", "cuneytozdemir",
"degirmencirfan", "denizyildirim79", "doganburak29", "dunya20101",
"efekerem", "emrkongar", "EremSenturk", "erenerdemnet", "ETemelkuran",
"fatihportakal", "fatihtezcan", "gokhanozbek", "haciykk", "hakanchelik",
"Halitisci", "HasanCucuk", "haykobagdat", "hikmetgenc", "hilal_kaplan",
"imambakirukus", "iremafsin", "ismaildukel", "ismailsaymaz",
"kenan_kiran", "KucukkayaIsmail", "mahmutovur", "MaliGuller",
"Malikejder47", "MehToprak", "melihaltinok", "merdanyanardag",
"mkirikkanat", "mustafabalbay", "mustafahos", "nevzatcicek",
"nihatsirdar", "NuranWolf", "ozge_mumcu", "ozgurmumcu", "RuhatMengi34",
"s_hablemitoglu", "samiltayyar27", "Sarikli_Voyvoda", "sarseven",
"SedefKabas", "sevilayyaziyor", "siring", "slymnoz", "TaylanKulacogIu",
"Tuluhantekeli", "turgayguler", "ugurdundarsozcu", "unsalunlu",
"utkucakirozer", "veyisates", "yenisafakwriter", "yvzah", "YZGLLDGN",
"zihnicakir"), class = "factor")
解决方案
阅读您的问题和代码,在我看来您想要执行以下操作。您以第 7 行为例。您尝试检查该行中的每个名称是否出现在J_pro
或中J_Con
。如果名称出现在J_Con
将 1 分配给该名称。如果名称出现在J_Pro
将 -1 分配给该名称。否则将 0 分配给名称。我认为你试图在你的代码中做到这一点。然后,对于每一行,您想要总结来自两个逻辑检查的值。
鉴于此,首先,我创建了一个名为id
然后将数据转换为长格式数据的组变量。我创建了两列来进行逻辑检查。然后,我总结了这两列。最后,我删除了列。如果需要进一步按screen_name聚合数据,还需要一个group_by + summarise
library(dplyr)
library(tidyr)
library(purrr)
# Check if there is any overlapping between the two.
intersect(J_Pro, J_Con)
group_by(df, id = 1:n()) %>%
unnest_longer(mentions) %>%
mutate(check_con = if_else(mentions %in% J_Con, 1, 0),
check_pro = if_else(mentions %in% J_Pro, -1, 0)) %>%
summarize(screen_name = first(screen_name),
res = sum(check_con) + sum(check_pro)) %>%
select(-contains("check"))
# id screen_name res
# <int> <chr> <dbl>
# 1 1 N_bingol -1
# 2 2 SAliakman -1
# 3 3 TC_SelimCandas -1
# 4 4 deniyeciz -1
# 5 5 batuozcn -1
# 6 6 dst_cemal -1
# 7 7 kentineiyibak 1
# 8 8 GokhanMHP -1
# 9 9 GokhanMHP -1
#10 10 ramadanmeletli -1
另一个想法如下。对于 中的每个列表mentions
,使用 创建一个字符向量unlist()
,运行逻辑检查,并对值求和。此过程执行两次(一次用于 J_con,另一次用于 J_Pro)。返回的值将转到res
.
mutate(df,
res = map(.x = mentions,
.f = function(x) {sum(if_else(unlist(x) %in% J_Con, 1, 0)) +
sum(if_else(unlist(x) %in% J_Pro, -1, 0))}))
# screen_name mentions res
#1 N_bingol turgayguler, mecertas, yasingorgulu28, Atasehirakgenc, SAliakman -1
#2 SAliakman N_bingol, turgayguler, mecertas, yasingorgulu28, Atasehirakgenc, SAliakman -1
#3 TC_SelimCandas MesutKokten, ahmethc, MHP_Bilgi -1
#4 deniyeciz ahmethc -1
#5 batuozcn tevfik_uygur, turgayguler -1
#6 dst_cemal hikmetgenc -1
#7 kentineiyibak cuneytozdemir, ahmethc, fatihportakal 1
#8 GokhanMHP ahmethc -1
#9 GokhanMHP ahmethc -1
#10 ramadanmeletli GokhanMHP, ahmethc -1
推荐阅读
- javascript - 当我请求抓取的链接不存在时,请求承诺不断说它是成功的
- azure - 如何通过 VSTS 将 Azure WebJob 和 App 部署到同一个 App Service?
- python - ModuleNotFoundError:没有名为“matplotlib._path”的模块
- selenium-webdriver - Safari Webdriver 确实在初始化时引发异常
- swift - 'self' 在 'catch' 块中使用,可从 super.init 调用访问
- gluon - IOS 本地化在 gluonhq/javafxports 上不起作用
- javascript - 同时使用两个键码
- python - lxml / BeautifulSoup 解析器警告
- java - 为什么我不能在创建实例或抛出异常时调用 initCause()
- angular - mat-stepper 未在 mat-dialog 中显示