首页 > 解决方案 > 在 dplyr 中跨多个列传播变量

问题描述

假设我有以下数据集:

df <- read.table(header=TRUE, text="
politics_collapse question_id mean_confidence mean_accuracy mean_importance
Democrat arms_manufacturing_company 24.00000 0.0000000 1.000000
Democrat black_panther 48.50000 0.0000000 1.500000
Democrat stranger_things_universe 55.50000 0.2500000 2.500000
Democrat the_office 37.66667 0.6666667 1.666667
Democrat tupac 80.33333 1.0000000 2.000000
Democrat uber_ceo 39.60000 0.8000000 2.600000
Republican arms_manufacturing_company 37.00000 1.0000000 1.000000
Republican black_panther 45.00000 1.0000000 2.000000
Republican stranger_things_universe 33.00000 1.0000000 3.000000")

我正在尝试将politics_collapse列分散到各个mean_confidence, mean_accuracy, and mean_importance列中。结果输出将是一个mean_confidence_democrat, mean_accuracy_democrat, 和mean_importance_democrat... 和共和党相同。

所以像这样:

df <- read.table(header=TRUE, text="
question_id mean_confidence_democrat mean_accuracy_democrat mean_importance_democrat mean_confidence_republican mean_accuracy_republican mean_importance_republican
arms_manufacturing_company 
black_panther 
stranger_things_universe 
the_office 
tupac 
uber_ceo 
arms_manufacturing_company 
black_panther 
stranger_things_universe")

显然,每一行中都会有数值。

我在这里遇到了这个小插曲:https ://community.rstudio.com/t/spread-with-multiple-value-columns/5378建议使用全新的“枢轴功能”,但我不知道如何获得他们去工作。我还尝试嵌套值、传播它们和取消嵌套,但没有让它起作用。

标签: rdplyr

解决方案


这可能是您正在寻找的:

library(tidyverse)

df %>%
  gather("metric", "score", mean_confidence, mean_accuracy, mean_importance) %>%
  mutate(metric = paste0(metric, "_", politics_collapse)) %>%
  select(-politics_collapse) %>%
  spread(metric, score)

                 question_id mean_accuracy_Democrat mean_accuracy_Republican mean_confidence_Democrat mean_confidence_Republican mean_importance_Democrat
1 arms_manufacturing_company              0.0000000                        1                 24.00000                         37                 1.000000
2              black_panther              0.0000000                        1                 48.50000                         45                 1.500000
3   stranger_things_universe              0.2500000                        1                 55.50000                         33                 2.500000
4                 the_office              0.6666667                       NA                 37.66667                         NA                 1.666667
5                      tupac              1.0000000                       NA                 80.33333                         NA                 2.000000
6                   uber_ceo              0.8000000                       NA                 39.60000                         NA                 2.600000
  mean_importance_Republican
1                          1
2                          2
3                          3
4                         NA
5                         NA
6                         NA

推荐阅读