r - 基于 r 中另一列 1 的行数
问题描述
好的,我有以下数千行的数据框,下面给出了数据框的输出。这个数据框记录了一个电子商务网站上的订单,它列出了每个订单id所购买的产品
| order_id| product_id|product_name |
|--------:|----------:|:--------------------------------|
| 1187899| 196|Soda |
| 1187899| 25133|Organic String Cheese |
| 1187899| 38928|0% Greek Strained Yogurt |
| 1187899| 26405|XL Pick-A-Size Paper Towel Rolls |
| 1187899| 39657|Milk Chocolate Almonds |
| 1187899| 10258|Pistachios |
| 1187899| 13032|Cinnamon Toast Crunch |
| 1187899| 26088|Aged White Cheddar Popcorn |
| 1187899| 27845|Organic Whole Milk |
| 1187899| 49235|Organic Half & Half |
| 1187899| 46149|Zero Calorie Cola |
| 1492625| 22963|Organic Roasted Turkey Breast |
| 1492625| 7963|Gluten Free Whole Grain Bread |
| 1492625| 16589|Plantain Chips |
| 1492625| 32792|Chipotle Beef & Pork Realstick |
用于列出上述数据框的代码是:
temp <- orders %>%
inner_join(opt,by="order_id") %>%
inner_join(products,by="product_id") %>%
select(order_id,product_id,product_name)
kable(head(temp,15))
我想统计订购次数最多的产品,基本上,我的输出应该是这样的:
product_id | Order_Count
196 10025
7963 9025
25133 8903
我不知道该怎么做,我试过以下:
mutate(prods = count(product_id))
但它没有用,我收到一条错误消息:Error in mutate_impl(.data, dots) :
Evaluation error: no applicable method for 'groups' applied to an object of class "factor".
任何帮助将不胜感激!
解决方案
您可以使用table()
打印一个简单的表格(如 Rui Barradas 所述),或者dplyr::count()
如果您想要一个带有计数的数据框,请使用。
library(tidyverse)
orders <- tibble::tribble(
~order_id, ~product_id, ~product_name,
"1187899", "196", "Soda",
"1187899", "25133", "Organic String Cheese",
"1187899", "38928", "0% Greek Strained Yogurt",
"1187899", "26405", "XL Pick-A-Size Paper Towel Rolls",
"1187899", "39657", "Milk Chocolate Almonds",
"1187899", "10258", "Pistachios",
"1187899", "10258", "Pistachios",
"1187899", "10258", "Pistachios",
"1187899", "13032", "Cinnamon Toast Crunch",
"1187899", "13032", "Cinnamon Toast Crunch",
"1187899", "26088", "Aged White Cheddar Popcorn",
"1187899", "27845", "Organic Whole Milk",
"1187899", "49235", "Organic Half & Half",
"1187899", "46149", "Zero Calorie Cola",
"1492625", "22963", "Organic Roasted Turkey Breast",
"1492625", "7963", "Gluten Free Whole Grain Bread",
"1492625", "16589", "Plantain Chips",
"1492625", "32792", "Chipotle Beef & Pork Realstick"
)
一个带有(例如)每个 product_id 计数的简单打印表
table(orders$product_id)
但是,如果您想要一个带有 count、绘图或用于任何用途的数据框,那么
orders %>%
count(product_id, product_name)
> + # A tibble: 15 x 3
> product_id product_name n
> <chr> <chr> <int>
> 1 10258 Pistachios 3
> 2 13032 Cinnamon Toast Crunch 2
> 3 16589 Plantain Chips 1
> 4 196 Soda 1
> 5 22963 Organic Roasted Turkey Breast 1
> 6 25133 Organic String Cheese 1
> 7 26088 Aged White Cheddar Popcorn 1
> 8 26405 XL Pick-A-Size Paper Towel Rolls 1
> 9 27845 Organic Whole Milk 1
> 10 32792 Chipotle Beef & Pork Realstick 1
> 11 38928 0% Greek Strained Yogurt 1
> 12 39657 Milk Chocolate Almonds 1
> 13 46149 Zero Calorie Cola 1
> 14 49235 Organic Half & Half 1
> 15 7963 Gluten Free Whole Grain Bread 1
推荐阅读
- javascript - 访问箭头函数范围之外的变量
- powershell - Powershell 报价
- python - Python 正则表达式错误字符范围
- c++ - 如何使鼠标光标指向窗口?
- c++ - 将函数内容打印为字符串,但也可以将其作为代码运行
- python - Python,嵌套列表推导
- c++ - wxPopupTransientWindow 未正确显示内容
- javascript - Tom-Select 作为 Django 小部件
- swift - 无法读取数据,因为它的格式不正确。在 Swift 中使用 API 时
- python - 为 pandas df 的 if 语句苦苦挣扎:一个系列的真值是模棱两可的。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()