r - 合作伙伴之间的互动实例
问题描述
研究背景:演讲者(作者)和接受者就某个讨论主题进行书面交流。第一位发言者是发帖的原人。
数据如下:
structure(list(topic = c(1, 1, 1, 1, 1, 1, 2, 2), thread = c(1,
1, 1, 2, 2, 2, 3, 3), speaker_id = c(111, 111, 111, 222, 222,
222, 111, 222), recipient_id = c(222, 333, 444, 111, 555, 444,
222, 111), dyad = structure(c(1L, 2L, 3L, 1L, 5L, 4L, 1L, 1L), .Label = c("111_222",
"111_333", "111_444", "222_444", "222_555"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L), codepage = 65001L)
目标是创建两个变量:
- threads_partnered:在一个讨论主题中有多少个线程是说话者和接受者合作的(即,组成一个二元组或直接交互)?
- threads_present:在讨论主题中,除了给定线程之外,演讲者和接收者作为接收者出现在多少线程中,没有合作(或形成二元组)?
根据示例数据,结果将如下所示:
╔═══════╦════════╦═════════╦═══════════╦═════════╦═══════════╦══════════════════════════════════════════╦═════════╦════════════════════════════════════════════╗
║ topic ║ thread ║ speaker ║ recipient ║ dyad ║ threads ║ note ║ threads ║ note ║
║ ║ ║ id ║ id ║ ║ partnered ║ ║ present ║ ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 1.00 ║ 1 ║ 111 ║ 222 ║ 111_222 ║ 2 ║ 111 and 222 interacted (made a dyad) ║ 0 ║ Outside the given thread (thread #1) of ║
║ ║ ║ ║ ║ ║ ║ in two different threads (thread #1, #2) ║ ║ the given topic (topic #1), 111 and 222 ║
║ ║ ║ ║ ║ ║ ║ within topic 1 ║ ║ are not found together as recipients ║
║ ║ ║ ║ ║ ║ ║ ║ ║ other than being in a dyad ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 1.00 ║ 1 ║ 111 ║ 333 ║ 111_333 ║ 1 ║ 111 and 333 interacted in ║ 0 ║ ║
║ ║ ║ ║ ║ ║ ║ one thread (thread #1) ║ ║ ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 1.00 ║ 1 ║ 111 ║ 444 ║ 111_444 ║ 1 ║ 111 and 444 interacted in ║ 1 ║ 111 and 444 are found in thread #2, ║
║ ║ ║ ║ ║ ║ ║ one thread (thread #1) ║ ║ where they did not interact (made a dyad), ║
║ ║ ║ ║ ║ ║ ║ ║ ║ but were only recipients of ║
║ ║ ║ ║ ║ ║ ║ ║ ║ the original speaker (111) ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 1.00 ║ 2 ║ 222 ║ 111 ║ 111_222 ║ 2 ║ 111 and 222 interacted in two different ║ 0 ║ ║
║ ║ ║ ║ ║ ║ ║ threads within topic 1 ║ ║ ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 1.00 ║ 2 ║ 222 ║ 555 ║ 222_555 ║ 1 ║ 222 and 555 interacted in one thread ║ 0 ║ ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 1.00 ║ 2 ║ 222 ║ 444 ║ 222_444 ║ 1 ║ 222 and 444 interacted in one thread ║ 1 ║ 222 and 444 are found together ║
║ ║ ║ ║ ║ ║ ║ ║ ║ in thread #1, where they did not ║
║ ║ ║ ║ ║ ║ ║ ║ ║ interact ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 2.00 ║ 3 ║ 111 ║ 222 ║ 111_222 ║ 1 ║ 111 and 222 interacted in one thread ║ 0 ║ ║
║ ║ ║ ║ ║ ║ ║ (thread 3) within topic 2 ║ ║ ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║ 2.00 ║ 3 ║ 222 ║ 111 ║ 111_222 ║ 1 ║ same as above ║ 0 ║ ║
╚═══════╩════════╩═════════╩═══════════╩═════════╩═══════════╩══════════════════════════════════════════╩═════════╩════════════════════════════════════════════╝
解决方案
不完全确定这是否能满足您的需求,但也许它在某种程度上可能会有所帮助。
我创建了一个自定义函数来获取发言人、收件人、线程和主题,并threads_present
根据您的描述确定。这包括查看thread
同一 s 中的其他 s topic
,检查以确保其他thread
s 不包含说话者和接收者作为dyad
. 最后,thread
应该在某行中同时包含演讲者和接收者作为接收者。thread
然后计算这些s。
第二个threads_partnered
更直接,并在评论中进行了描述。在您group_by
和您之后topic
,dyad
您可以确定唯一thread
s的数量n_distinct
。
library(tidyr)
library(dplyr)
library(purrr)
my_fun <- function(the_speaker, the_recipient, the_thread, the_topic) {
df %>%
filter(
topic == the_topic,
thread != the_thread,
dyad != paste(min(the_speaker, the_recipient), max(the_speaker, the_recipient), sep = "_")) %>%
group_by(thread) %>%
filter(all(c(the_speaker, the_recipient) %in% recipient_id)) %>%
ungroup() %>%
distinct(thread) %>%
count(name = "threads_present")
}
df %>%
mutate(threads_present = pmap(
list(the_speaker = speaker_id, the_recipient = recipient_id, the_thread = thread, the_topic = topic),
my_fun)
) %>%
unnest(cols = threads_present) %>%
group_by(topic, dyad) %>%
mutate(threads_partnered = n_distinct(thread))
输出
topic thread speaker_id recipient_id dyad threads_present threads_partnered
<dbl> <dbl> <dbl> <dbl> <fct> <int> <int>
1 1 1 111 222 111_222 0 2
2 1 1 111 333 111_333 0 1
3 1 1 111 444 111_444 1 1
4 1 2 222 111 111_222 0 2
5 1 2 222 555 222_555 0 1
6 1 2 222 444 222_444 1 1
7 2 3 111 222 111_222 0 1
8 2 3 222 111 111_222 0 1
推荐阅读
- javascript - 从另一个页面获取元素
- java - 正则表达式用于在不被单引号或双引号包围时使用空格和特殊字符分割字符串
- php - 用户 'database username'@'%' 拒绝访问数据库 'information_schema'
- c++ - 如果可能,总是更喜欢 std::string(ptr, size) 到 std::string(first, last) ?
- spring - RestEasy 端点抛出 java.lang.NoSuchMethodError: org.codehaus.jackson.map.ObjectMapper.constructType
- node.js - 使用 Puppeteer 捕获表格元素并与之交互
- css - 如何在 django 中有条件地向元素添加 css 类?
- python - 在 _fields_ 中混合 ctypes 和 python 类
- java - 按 indexof 数组排序类列表
- laravel - 服务器上的cors问题,但不是本地主机上的问题