r - 从 R 中的交互式直方图选择中提取和汇总数据
问题描述
我想使用类似于此示例集的数据在 R 中使用 plotly(或其他更适合的包)创建交互式直方图:
test<-data.frame(sex=c("m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m"),weight=runif(80,5,9))
我想显示每个性别体重分布的两个重叠直方图,其中包含一些汇总统计数据,例如标准差、平均值、样本数、所有性别以及全球。此外,我希望能够在将这些汇总统计信息更新为选择时,最好使用范围滑块或选择框进行选择。然后我希望能够向原始数据集添加一个变量,以指示样本是否是选择的一部分。谢谢你的帮助!即使它只是指向相关的在线资源,我也很难找到解决类似问题的资源。
解决方案
@DataZhukov 这是基于您更大数据样本的修订答案。根据回复,我并排删除了(想想年龄金字塔)并展示了如何{plotly}
用于直方图。
虽然{plotly}
支持交互性,但它基于“静态”html 网页的概念。这意味着在客户端/查看页面的用户上没有进行“活动”计算。对于简单的统计/摘要,您可以查看{crosstalk}
&SummaryWidget以启用(某些)“动态”更新(即客户端计算)。对于成熟的动态选择/过滤/重新计算类型的交互性,{shiny}
这是要走的路。(但那是另一场球赛。)
{plotly}
允许您通过指定add_text()
图层“自由”放置文本注释。我根据您的数据构建了这个。您也可以以向量的形式手动定义它。
如果您使用数据帧作为输入数据结构,请注意该变量{plotly}
使用波浪符号 ( ~
)。
test<-data.frame(sex=c("m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m"),weight=runif(80,5,9))
# calculate mean, sd, etc based on given data
# note you can also define this with simple vectors
total_stats <- test_df %>%
summarise(SAMPLE = n(), MEAN_WEIGHT = mean(weight), SD = sd(weight)) %>%
mutate(sex = "m+f")
group_stats <- test_df %>% group_by(sex) %>%
summarise(SAMPLE = n(), MEAN_WEIGHT = mean(weight), SD = sd(weight))
my_stats <- bind_rows(total_stats, group_stats) %>%
mutate(LABEL = paste0(sex, " sample size: ", SAMPLE
, " with mean ", round(MEAN_WEIGHT, 2)
, " and SD ", round(SD, 2)
)
)
# format your text, e.g. font face and size ---- format to your liking
tf <- list(
family = "sans serif",
size = 11
)
{plotly}
调用并排构造“金字塔”而不是重叠并向其添加文本层。
test %>%
plot_ly() %>%
# ------------ plot histogram ----------------------
add_histogram( x = ~weight, color = ~sex
,nbinsx = 20 # set the number of bins you want/need
) %>%
# ------------ add annotation layer ---------------
## I provide x, y positions as vector, you could add and place
## each label as its own layer, i.e. add_text() call
add_text(data = my_stats
,x = c(5.2, 6,6.3), y = c(6, 5, 4.5)
,text = ~LABEL
,name = "" # left empty as we do not need to name the layer
,textfont = tf
,textposition = "right"
, showlegend = FALSE
) %>%
layout(yaxis = list(title =""))
这产生:
显然,您可以自由定义文本注释的 x,y 位置。
默认行为将计数条并排放置。如果您想强制“叠加”行为,您可以绘制 2 个直方图并强制这 2 个图形图层叠加。对于后者,您需要在layout()
图层中设置模式。我也设置了 alpha 透明度,因为您的数据样本中可能有重叠计数。文本放置等遵循上述原则。
# split test data frame in a male and female df
males <- test %>% filter(sex == "m")
fems <- test %>% filter(sex == "f")
plot_ly(
alpha = 0.5 # set alpha to ensure visibility on overlapping counts
, nbinsx = 20 # set number of bins
) %>%
#------------ add a histogram layer per group -------------------
add_histogram(data = males, x = ~weight, name = "male") %>%
add_histogram(data = fems, x = ~weight, name = "female") %>%
#------------ tweak layout --------------------------------------
layout(
barmode = "overlay" # to change side-by-side default to overlay
)
推荐阅读
- javascript - e.target 和 htmlelement 不能进行比较有什么原因吗?
- rabbitmq - Rabbitmq 中的哪些进程在分配“消息”类型的内存时可能导致崩溃
- arduino - GSM 没有响应
- linux - 为什么它不起作用?如果 bash 中的指令
- azure-cosmosdb - 指定的输入之一无效
- java - 如何拆分 Java 对象的元素?
- firebase - 从firebase中的文件导入环境变量
- c - 分配堆内存以返回调用函数
- javascript - javascript验证通配符和文本框的空格
- java - 在 Eclipse Project Facets 中不能同时选择动态 web 模块和 Ejb