首页 > 解决方案 > 基于 r 中另一列 1 的行数

问题描述

好的,我有以下数千行的数据框,下面给出了数据框的输出。这个数据框记录了一个电子商务网站上的订单,它列出了每个订单id所购买的产品

     | order_id| product_id|product_name                     |
     |--------:|----------:|:--------------------------------|
     |  1187899|        196|Soda                             |
     |  1187899|      25133|Organic String Cheese            |
     |  1187899|      38928|0% Greek Strained Yogurt         |
     |  1187899|      26405|XL Pick-A-Size Paper Towel Rolls |
     |  1187899|      39657|Milk Chocolate Almonds           |
     |  1187899|      10258|Pistachios                       |
     |  1187899|      13032|Cinnamon Toast Crunch            |
     |  1187899|      26088|Aged White Cheddar Popcorn       |
     |  1187899|      27845|Organic Whole Milk               |
     |  1187899|      49235|Organic Half & Half              |
     |  1187899|      46149|Zero Calorie Cola                |
     |  1492625|      22963|Organic Roasted Turkey Breast    |
     |  1492625|       7963|Gluten Free Whole Grain Bread    |
     |  1492625|      16589|Plantain Chips                   |
     |  1492625|      32792|Chipotle Beef & Pork Realstick   |

用于列出上述数据框的代码是:

 temp <- orders  %>%
  inner_join(opt,by="order_id") %>%
  inner_join(products,by="product_id") %>%
  select(order_id,product_id,product_name)
  kable(head(temp,15))

我想统计订购次数最多的产品,基本上,我的输出应该是这样的:

     product_id | Order_Count
        196         10025
        7963        9025
        25133       8903

我不知道该怎么做,我试过以下:

      mutate(prods = count(product_id))

但它没有用,我收到一条错误消息:Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'groups' applied to an object of class "factor".

任何帮助将不胜感激!

标签: rdataframedplyr

解决方案


您可以使用table()打印一个简单的表格(如 Rui Barradas 所述),或者dplyr::count()如果您想要一个带有计数的数据框,请使用。

library(tidyverse)

orders <- tibble::tribble(
  ~order_id, ~product_id, ~product_name,
  "1187899", "196", "Soda",
  "1187899", "25133", "Organic String Cheese",
  "1187899", "38928", "0% Greek Strained Yogurt",
  "1187899", "26405", "XL Pick-A-Size Paper Towel Rolls",
  "1187899", "39657", "Milk Chocolate Almonds",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "26088", "Aged White Cheddar Popcorn",
  "1187899", "27845", "Organic Whole Milk",
  "1187899", "49235", "Organic Half & Half",
  "1187899", "46149", "Zero Calorie Cola",
  "1492625", "22963", "Organic Roasted Turkey Breast",
  "1492625", "7963", "Gluten Free Whole Grain Bread",
  "1492625", "16589", "Plantain Chips",
  "1492625", "32792", "Chipotle Beef & Pork Realstick"
)

一个带有(例如)每个 product_id 计数的简单打印表

table(orders$product_id)

但是,如果您想要一个带有 count、绘图或用于任何用途的数据框,那么

orders %>%
  count(product_id, product_name)

> + # A tibble: 15 x 3
>    product_id product_name                         n
>    <chr>      <chr>                            <int>
>  1 10258      Pistachios                           3
>  2 13032      Cinnamon Toast Crunch                2
>  3 16589      Plantain Chips                       1
>  4 196        Soda                                 1
>  5 22963      Organic Roasted Turkey Breast        1
>  6 25133      Organic String Cheese                1
>  7 26088      Aged White Cheddar Popcorn           1
>  8 26405      XL Pick-A-Size Paper Towel Rolls     1
>  9 27845      Organic Whole Milk                   1
> 10 32792      Chipotle Beef & Pork Realstick       1
> 11 38928      0% Greek Strained Yogurt             1
> 12 39657      Milk Chocolate Almonds               1
> 13 46149      Zero Calorie Cola                    1
> 14 49235      Organic Half & Half                  1
> 15 7963       Gluten Free Whole Grain Bread        1

推荐阅读