首页 > 解决方案 > 减去 R 中组内的值

问题描述

我有一个包含变量的数据集,这些变量提供有关给定中voteshare的 a的信息以及各个政党是否向议会发送 a 的信息,如下所示:partyyeardistrictcandidate

year district party voteshare candidate
2000 A        P1    50%       1
2000 A        P2    30%       0
2000 A        P3    20%       0
2000 B        P1    43%       1
2000 B        P2    21%       0
2000 B        P3    34%       0
...

现在,我想通过从获胜党(向议会派出候选人的政党)和获胜党的投票中减去每一党的投票份额来计算每一党的输/赢幅度(即各党的选举“接近”程度)来自第二个成功方的投票份额,例如:

year district party voteshare candidate margin
2000 A        P1    50%       1         +20%
2000 A        P2    30%       0         -20%
2000 A        P3    20%       0         -30%
2000 B        P1    43%       1         +9%
2000 B        P2    21%       0         -22%
2000 B        P3    34%       0         -9%
...

我不知道如何用 dplyr 做到这一点......

标签: rdplyr

解决方案


你可以做 :

library(dplyr)

df1 %>%
  #Turn voteshare to a number
  mutate(voteshare = readr::parse_number(voteshare)) %>%
  group_by(year, district) %>%
  #When candidate is sent to parliament
  mutate(margin = case_when(candidate == 1 ~ 
                            #Subtract with second highest voteshare
                            voteshare - sort(voteshare, decreasing = TRUE)[2],
                            #else subtract with voteshare of highest candidate
                            TRUE ~ voteshare - voteshare[candidate == 1]))

#   year district party voteshare candidate margin
#  <int> <chr>    <chr>     <dbl>     <int>  <dbl>
#1  2000 A        P1           50         1     20
#2  2000 A        P2           30         0    -20
#3  2000 A        P3           20         0    -30
#4  2000 B        P1           43         1      9
#5  2000 B        P2           21         0    -22
#6  2000 B        P3           34         0     -9

数据

df1 <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L
), district = c("A", "A", "A", "B", "B", "B"), party = c("P1", 
"P2", "P3", "P1", "P2", "P3"), voteshare = c("50%", "30%", "20%", 
"43%", "21%", "34%"), candidate = c(1L, 0L, 0L, 1L, 0L, 0L)), 
class = "data.frame", row.names = c(NA, -6L))

推荐阅读