首页 > 解决方案 > 从 R 中的交互式直方图选择中提取和汇总数据

问题描述

我想使用类似于此示例集的数据在 R 中使用 plotly(或其他更适合的包)创建交互式直方图:

test<-data.frame(sex=c("m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m"),weight=runif(80,5,9))

我想显示每个性别体重分布的两个重叠直方图,其中包含一些汇总统计数据,例如标准差、平均值、样本数、所有性别以及全球。此外,我希望能够在将这些汇总统计信息更新为选择时,最好使用范围滑块或选择框进行选择。然后我希望能够向原始数据集添加一个变量,以指示样本是否是选择的一部分。谢谢你的帮助!即使它只是指向相关的在线资源,我也很难找到解决类似问题的资源。

标签: rplotlyhistogram

解决方案


@DataZhukov 这是基于您更大数据样本的修订答案。根据回复,我并排删除了(想想年龄金字塔)并展示了如何{plotly} 用于直方图。

虽然{plotly}支持交互性,但它基于“静态”html 网页的概念。这意味着在客户端/查看页面的用户上没有进行“活动”计算。对于简单的统计/摘要,您可以查看{crosstalk}SummaryWidget以启用(某些)“动态”更新(即客户端计算)。对于成熟的动态选择/过滤/重新计算类型的交互性,{shiny}这是要走的路。(但那是另一场球赛。)

{plotly}允许您通过指定add_text()图层“自由”放置文本注释。我根据您的数据构建了这个。您也可以以向量的形式手动定义它。

如果您使用数据帧作为输入数据结构,请注意该变量{plotly}使用波浪符号 ( ~)。

test<-data.frame(sex=c("m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m"),weight=runif(80,5,9))


# calculate mean, sd, etc based on given data
# note you can also define this with simple vectors
total_stats <- test_df %>% 
  summarise(SAMPLE = n(), MEAN_WEIGHT = mean(weight), SD = sd(weight)) %>%
  mutate(sex = "m+f")

group_stats <- test_df %>% group_by(sex) %>%
  summarise(SAMPLE = n(), MEAN_WEIGHT = mean(weight), SD = sd(weight))

my_stats <- bind_rows(total_stats, group_stats) %>%
  mutate(LABEL = paste0(sex, " sample size: ", SAMPLE
                        , " with mean ", round(MEAN_WEIGHT, 2)
                        , " and SD ", round(SD, 2)
                        )
         )

# format your text, e.g. font face and size ---- format to your liking
tf <- list(
  family = "sans serif",
  size = 11
)

{plotly}调用并排构造“金字塔”而不是重叠并向其添加文本层。

test %>%
  plot_ly() %>%
  # ------------ plot histogram ----------------------
  add_histogram( x = ~weight, color = ~sex
                ,nbinsx = 20       # set the number of bins you want/need
                ) %>%
  # ------------ add annotation layer ---------------
  ## I provide x, y positions as vector, you could add and place
  ## each label as its own layer, i.e. add_text() call
  add_text(data = my_stats
           ,x = c(5.2, 6,6.3), y = c(6, 5, 4.5)
           ,text = ~LABEL
           ,name = ""      # left empty as we do not need to name the layer
           ,textfont = tf
           ,textposition = "right"
           , showlegend = FALSE
  ) %>%
  layout(yaxis = list(title =""))

这产生:

绘制直方图

显然,您可以自由定义文本注释的 x,y 位置。

默认行为将计数条并排放置。如果您想强制“叠加”行为,您可以绘制 2 个直方图并强制这 2 个图形图层叠加。对于后者,您需要在layout()图层中设置模式。我也设置了 alpha 透明度,因为您的数据样本中可能有重叠计数。文本放置等遵循上述原则。

# split test data frame in a male and female df
males <- test %>% filter(sex == "m")
fems  <- test %>% filter(sex == "f")

plot_ly(
     alpha = 0.5     # set alpha to ensure visibility on overlapping counts
   , nbinsx = 20     # set number of bins
  ) %>%
#------------ add a histogram layer per group -------------------
  add_histogram(data = males, x = ~weight, name = "male") %>%
  add_histogram(data = fems,  x = ~weight, name = "female") %>%
#------------ tweak layout --------------------------------------
  layout(
    barmode = "overlay"   # to change side-by-side default to overlay
  )

强制叠加直方图


推荐阅读