首页 > 解决方案 > Vega Lite 按基于另一列的特定属性分组

问题描述

试图找到属性的妊娠/孵化(天)平均值:

Accipitriformes, Anseriformes, Charadriiformes 

这是 Aves 类的一部分。我不想在 Order 列中找到任何其他属性的平均值,只需要 Aves 类之外的属性。我的数据集示例如下所示:

|   Class     |   Order   |    Gestation/Incubation(days)     
  Amphilbia       Anura                   5
  Amphilbia       Anura                   4
  Amphilbia       Anura                   2
  Amphilbia       Caudata                 4
  Amphilbia       Caudata                 2
   Mammalia     Artiodactyla              10
   Mammalia     Artiodactyla              8
   Mammalia       Rodentia                14
   Mammalia       Rodentia                13
     Aves      Accipitriformes            12
     Aves      Accipitriformes            17
     Aves      Accipitriformes            12
     Aves       Anseriformes              9
     Aves       Anseriformes              8
     Aves       Anseriformes              9
     Aves     Charadriiformes             10
     Aves     Charadriiformes             12
     Aves     Charadriiformes             14

我能够在 Class 列中找到不同属性的平均值,例如(参见 vega-lite 演示链接):

Amphilbia, Mammalia, Aves 

但我无法在 Class = Aves 的 Order 列中找到属性的平均值。

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
   "data": {
     "url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
    "format": {"type": "tsv"}
       },
   "title": {
     "text": " Average Gestation/Incubation days for Orders with Aves",
    "anchor": "middle"
   },
   "width": 600,
   "height": 600,
  "transform": [
     {
       "aggregate": [
        {"op": "average", "field": "Gestation/Incubation (days)", "as": "avg_incub"},
        {"op": "count", "field": "Class", "as": "make_cnt"}
      ],
      "groupby": ["Class"]
    },
     {"filter": "datum.make_cnt > 50"}
   ],
   "mark": {"type": "bar"},
   "encoding": {
     "y": {
       "field": "avg_incub",
       "type": "quantitative",
       "axis": {"title": "Average Incubation"}
     },
     "x": {
       "field": "Class",
       "type": "nominal",
       "sort": {"encoding": "x", "order": "descending"},
       "axis": {"title": "Orders"}
    }
  }
}

演示链接

我尝试使用过滤器功能过滤我的数据,因此只剩下 Class == Aves 的数据,但这并没有解决我的问题。我一定没有正确使用它。除非他们是做我想要达到的目标的另一种方式。

{"filter": "datum.Class == 'Aves'"}

标签: vega-lite

解决方案


您可以通过两个过滤步骤来做到这一点,一个是按类过滤,一个是按顺序过滤。此时,在编码中使用聚合是计算分组均值的最直接方法。

这是一个示例(vega 编辑器):

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "data": {
    "url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
    "format": {"type": "tsv"}
  },
  "title": {
    "text": " Average Gestation/Incubation days for Orders with Aves",
    "anchor": "middle"
  },
  "transform": [
    {"filter": "datum.Class == 'Aves'"},
    {"filter": {"field": "Order", "oneOf": ["Accipitriformes", "Anseriformes", "Charadriiformes"]}}
  ],
  "mark": {"type": "bar"},
  "encoding": {
    "x": {"field": "Order", "type": "nominal"},
    "y": {"field": "Gestation/Incubation (days)", "type": "quantitative", "aggregate": "mean"}
  }
}

在此处输入图像描述


推荐阅读