首页 > 解决方案 > R:如何在 dplyr 中进行条件计数?

问题描述

我有这个数据框。我想汇总数据,以便一列显示总启动次数,下一列显示总失败启动次数。

      state_name launch_year category
1  United States        1958  Success
2  United States        1958  Success
3  United States        1958  Success
4  United States        1958  Failure
5  United States        1958  Failure
6  United States        1958  Failure
7   Soviet Union        1957  Success
8   Soviet Union        1957  Success
9   Soviet Union        1958  Success
10  Soviet Union        1959  Success
11  Soviet Union        1959  Success
12  Soviet Union        1959  Success
13  Soviet Union        1958  Failure
14  Soviet Union        1958  Failure
15  Soviet Union        1958  Failure
16  Soviet Union        1958  Failure
17  Soviet Union        1959  Failure
18 United States        1959  Success
19 United States        1959  Failure
20 United States        1958  Success
21 United States        1959  Success
22 United States        1959  Failure
23 United States        1958  Success
24 United States        1958  Success
25 United States        1959  Success
26 United States        1959  Success
27 United States        1959  Success
28 United States        1959  Success
29 United States        1959  Success
30 United States        1959  Success
31 United States        1959  Success
32 United States        1958  Failure
33 United States        1958  Failure
34 United States        1959  Failure
35 United States        1959  Failure
36 United States        1959  Failure
37 United States        1958  Success
38 United States        1959  Success
39 United States        1959  Success
40 United States        1957  Failure
41 United States        1958  Failure
42 United States        1958  Failure
43 United States        1958  Failure
44 United States        1958  Failure
45 United States        1958  Failure
46 United States        1958  Failure
47 United States        1958  Failure
48 United States        1958  Failure
49 United States        1958  Failure
50 United States        1958  Failure
51 United States        1959  Failure
52 United States        1959  Failure

每行代表一次发射。该类别是发布的结果。

我想把它变成这样的东西。

      state_name launch_year launches  failed_launches
1  United States        1957  1          1
2  Soviet Union         1957  2          0
3  United States        1958  22         15
4  Soviet Union         1958  5          4
5  United States        1959  4          3
6  Soviet Union         1959  18         1

我尝试过滤到失败的启动,然后添加一failed_launch列,但我不知道如何从那里返回其余数据。

launches %>% 
  filter(category == "Failure") %>%
  count(state_name, launch_year) %>%
  mutate(failed_launches = n)

标签: rdplyr

解决方案


能做:

df %>%
  group_by(state_name, launch_year) %>%
  summarise(
    launches = n(),
    failed_launches = sum(category == "Failure")
  )

推荐阅读