首页 > 解决方案 > `transform_lookup` 是否节省空间?

问题描述

我正在尝试链接几个共享相同数据方面的 Altair 图表。我可以通过将所有数据合并到一个数据框中来做到这一点,但是由于数据的性质,合并的数据框比为两个图表中的每一个拥有两个单独的数据框所需的要大得多。这是因为每个图表的唯一列对于共享列中的每个条目都有许多重复的行。

会使用transform_lookup节省空间而不是仅使用合并的数据框,还是transform_lookup最终在内部进行整个合并?

标签: pythonaltair

解决方案


不,当您使用transform_lookup. 您可以通过打印您创建的图表的 json 规范来查看这一点。使用文档中的示例

import altair as alt
import pandas as pd
from vega_datasets import data

people = data.lookup_people().head(3)
people
    name    age height
0   Alan    25  180
1   George  32  174
2   Fred    39  182
groups = data.lookup_groups().head(3)
groups
    group   person
0   1   Alan
1   1   George
2   1   Fred

与熊猫合并:

merged = pd.merge(groups, people, how='left',
                  left_on='person', right_on='name')

print(alt.Chart(merged).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-b41b97ffc89b39c92e168871d447e720"
  },
  "datasets": {
    "data-b41b97ffc89b39c92e168871d447e720": [
      {
        "age": 25,
        "group": 1,
        "height": 180,
        "name": "Alan",
        "person": "Alan"
      },
      {
        "age": 32,
        "group": 1,
        "height": 174,
        "name": "George",
        "person": "George"
      },
      {
        "age": 39,
        "group": 1,
        "height": 182,
        "name": "Fred",
        "person": "Fred"
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar"
}

通过变换查找,所有数据都在那里,但对于单独的数据集(所以从技术上讲,它需要更多的空间来使用额外的大括号和变换):

print(alt.Chart(groups).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).transform_lookup(
    lookup='person',
    from_=alt.LookupData(data=people, key='name',
                         fields=['age'])
).to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-5fe242a79352d1fe243b588af570c9c6"
  },
  "datasets": {
    "data-2b374d1509415e1d327c3a7521f8117c": [
      {
        "age": 25,
        "height": 180,
        "name": "Alan"
      },
      {
        "age": 32,
        "height": 174,
        "name": "George"
      },
      {
        "age": 39,
        "height": 182,
        "name": "Fred"
      }
    ],
    "data-5fe242a79352d1fe243b588af570c9c6": [
      {
        "group": 1,
        "person": "Alan"
      },
      {
        "group": 1,
        "person": "George"
      },
      {
        "group": 1,
        "person": "Fred"
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar",
  "transform": [
    {
      "from": {
        "data": {
          "name": "data-2b374d1509415e1d327c3a7521f8117c"
        },
        "fields": [
          "age",
          "height"
        ],
        "key": "name"
      },
      "lookup": "person"
    }
  ]
}

transform_lookup如果将它与两个数据集的 URL 一起使用,什么时候可以节省空间:

people = data.lookup_people.url
groups = data.lookup_groups.url
print(alt.Chart(groups).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).transform_lookup(
    lookup='person',
    from_=alt.LookupData(data=people, key='name',
                         fields=['age'])
).to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "url": "https://vega.github.io/vega-datasets/data/lookup_groups.csv"
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar",
  "transform": [
    {
      "from": {
        "data": {
          "url": "https://vega.github.io/vega-datasets/data/lookup_people.csv"
        },
        "fields": [
          "age",
          "height"
        ],
        "key": "name"
      },
      "lookup": "person"
    }
  ]
}

推荐阅读