首页 > 解决方案 > 如何在 R 中为热图的每列添加自定义文本?

问题描述

我有一个数据集,我正在绘制一个热图来比较 7 个组。我也有每组 2 列描述该组的数据。我正在尝试创建一个交互式绘图,该绘图显示每个组的每个信息列的信息。

这是一个数据示例,其中 7 个组每个有 2 列相应的信息:

df <- structure(list(Group1 = c(9.420318259, 5.801092847, 4.890727291, 
4.589825753, 4.836092781), Group2 = c(14.57805564, 8.798453748, 
7.982599836, 7.951599435, 10.81418654), Group3 = c(14.49131554, 
7.975284646, 8.258878348, 7.922657108, 13.3205827), Group4 = c(11.44447147, 
6.208332721, 6.529806574, 4.882623805, 10.69676399), Group5 = c(22.86835197, 
10.94297858, 7.197041788, 9.237584441, 12.70083108), Group6 = c(10.62687539, 
6.458410247, 7.461916094, 6.308454021, 12.39464562), Group7 = c(11.09404106, 
6.420303272, 6.821000583, 5.0727153, 11.13903127), Group1_Genes = c(46L, 
17L, 23L, 16L, 27L), Group1_Score = c(0.719, 0.757, 0.71, 0.807, 
0.761), Group2_Genes = c(58L, 22L, 30L, 22L, 40L), Group2_Score = c(0.754, 
0.766, 0.741, 0.807, 0.774), Group3_Genes = c(37L, 14L, 14L, 
13L, 22L), Group3_Score = c(0.798, 0.788, 0.81, 0.879, 0.805), 
    Group4_Genes = c(55L, 20L, 29L, 21L, 42L), Group4_Score = c(0.774, 
    0.768, 0.741, 0.822, 0.781), Group5_Genes = c(71L, 24L, 37L, 
    23L, 53L), Group5_Score = c(0.766, 0.767, 0.765, 0.811, 0.771
    ), Group6_Genes = c(69L, 24L, 37L, 23L, 53L), Group6_Score = c(0.772, 
    0.767, 0.765, 0.811, 0.771), Group7_Genes = c(58L, 21L, 33L, 
    22L, 48L), Group7_Score = c(0.79, 0.78, 0.774, 0.817, 0.78
    )), row.names = c("Cardiac Hypertrophy", 
"Cellular Effects of Adrenaline", "Metastasis Signaling", 
"Hormone Signaling", "Estrogen Receptor Signaling"
), class = "data.frame")
#One row of this data looks like:
Pathway  Group1  Group2  Group3  Group4  Group5  Group6  Group7  Group1_Score  Group1_Genes  Group2_Score  Group2_Genes ...
Cardiac  0.7      0.8      0.5    0.7      0.3    0.6     0.6        0.6           34           0.4     65

我正在尝试在热图中绘制 groups1-7(也是 1:7 列),然后通过修改另一个问题(如何在 R 中使用自定义文本创建交互式热图图?)中的答案来使用其余列悬停文本热图:

groups <- as.matrix(df[,1:7]) 

labels1 <- 
  df  %>% 
  mutate(label1 = paste(
    "Gene Overlap:", Group1_Genes,
    "\nMean_GB_Score:", Group1_Score
  )) %>% 

  transmute(across(Group1, ~label1)) %>% 
  as.matrix()

labels2 <- 
  df  %>% 
  mutate(label2 = paste(
    "Gene Overlap:", Group2_Genes,
    "\nMean_GB_Score:", Group2_Score
  )) %>% 

  transmute(across(Group2, ~label2)) %>% 
  as.matrix()


#I repeat making 7 labels objects to then cbind:

labels = cbind(labels1, labels2, labels3, labels4, labels5, labels6, labels7)

heatmaply(groups, custom_hovertext = labels,
          file = "heatmaply_plot.html",
          scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
  low = "pink", 
  high = "red"))

但是尝试这样做会产生错误:

Error in custom_hovertext[rowInd, colInd, drop = FALSE] : 
  subscript out of bounds

有没有办法让我创建指定热图每列custom_textheatmaply()悬停文本信息,而不是给每个热图正方形的全局信息?

标签: rggplot2plotlyheatmapheatmaply

解决方案


labels_df <- 
  df %>% 
  select(ends_with("Score"), ends_with("Genes")) %>% 
  rownames_to_column() %>% 
  pivot_longer(-rowname) %>% 
  separate(name, c("Group", "var")) %>% 
  pivot_wider(c(rowname, Group), names_from = var, values_from = value) %>% 
  mutate(label = paste(
    "Gene Overlap:", Genes,
    "\nMean_GB_Score:", Score
  )) %>% 
  pivot_wider(rowname, names_from = Group, values_from = label)

您可以通过在任何地方断开链并运行代码来检查每一步发生的情况。但基本上我们只是进行一些转置以使数据以更有用的整洁格式计算,这样我们就不需要输入 7 个类似的表达式来计算标签。然后我们转回heatmaply.

这里要提到的重要一点是,在所有这些转置之后,行的顺序恰好与它们开始时的顺序相同。这很酷,但最好检查这些东西。

矩阵形式的标签:

labels_mat <- 
  labels_df %>% 
  select(Group1:Group7) %>% 
  as.matrix()

最后:

heatmaply(
  groups,
  custom_hovertext = labels_mat,
  scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(low = "pink", high = "red")
)

推荐阅读