r - find n% of records in a variable in the data frame
问题描述
I have the data in a data frame, with the first column is date and the second column is individual weight. Here's a sample from the data:
df <- data.frame(
date = c("2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-02", "2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-02", "2019-01-02", "2019-01-02"),
weight = c(2174.8, 2174.8, 2174.8, 8896.53, 8896.53, 2133.51, 2133.51,
2892.32, 2892.32, 2892.32, 2892.32, 5287.78, 5287.78, 6674.03,
6674.03, 6674.03, 6674.03, 6674.03, 5535.11, 5535.11)
)
I would like to run simple summary statistic for each date first and then find number of records whose weight is in the given range, defining the category by the % of total range of weights. Finally store number of each record in a separate column
Lowest 10%
10-20%
20-40%
40-60%
60-80%
80-90%
90-100%
The logic = (MinWeight + (MaxWeight-MinWeight)*X%)
Here is my expected outcome ( I only show two columns for % range)
df %>%
group_by(date) %>%
summarise(mean(weight), min(weight), max(weight))
date `mean(weight)` `min(weight)` `max(weight)` `Lowest 10%` `10-20%`
2019-01-01 3726. 2134. 8897. num records. num records.
解决方案
Check this solution:
library(tidyverse)
library(wrapr)
df %>%
group_by(date) %>%
mutate(
rn = row_number(),
temp = weight - min(weight),
temp = (temp / max(temp)) * 100,
temp = cut(temp, seq(0, 100, 10), include.lowest = TRUE),
temp = str_remove(temp, '\\(|\\[') %>%
str_replace(',', '-') %>%
str_replace('\\]', '%'),
one = 1
) %>%
spread(temp, one, fill = 0) %.>%
left_join(
summarise(.,
`mean(weight)` = mean(weight),
`min(weight)` = min(weight),
`max(weight)` = max(weight)
),
summarise_at(., vars(matches('\\d+-\\d+.')), sum)
)
Output:
date `mean(weight)` `min(weight)` `max(weight)` `0-10%` `10-20%` `60-70%` `90-100%`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-01-01 3726. 2134. 8897. 5 3 0 2
2 2019-01-02 5791. 2892. 6674. 1 0 4 5
推荐阅读
- c - 编写一个程序,使用 for 循环遍历这个数组,显示存储在每个索引处的值
- javascript - 删除窗口调整大小时附加的脚本
- ruby - Ruby 退出后,STDIN.read_nonblock 会弄乱其他程序
- python - 如何填充 Pandas 系列中缺失的元素?
- ansible - 有没有办法在剧本中定义 ansible 库存路径?
- javascript - 选中多个复选框时,表单字段应填充公共字段
- c# - 加载新窗体时显示消息框
- javascript - 我的代码什么也没做。可能是什么原因?
- python-3.x - 如何在语句中添加反斜杠而不导致 EOF 错误?
- angular - Angular-Material-Datepicker:自定义选择器的价值问题