首页 > 解决方案 > 将 DataArray 转换为 DataFrame

问题描述

有没有一种简单的方法可以将 xarray DataArray 转换为 pandas DataFrame,我可以在其中规定哪些维度变成索引/列?例如,假设我有一个 DataArray

import xarray as xr
weather = xr.DataArray(
    name='weather',
    data=[['Sunny', 'Windy'], ['Rainy', 'Foggy']],
    dims=['date', 'time'],
    coords={
        'date': ['Thursday', 'Friday'],
        'time': ['Morning', 'Afternoon'],
    }
)

这导致:

<xarray.DataArray 'weather' (date: 2, time: 2)>
array([['Sunny', 'Windy'],
       ['Rainy', 'Foggy']], dtype='<U5')
Coordinates:
  * date     (date) <U8 'Thursday' 'Friday'
  * time     (time) <U9 'Morning' 'Afternoon'

假设我现在想将它移动到按日期索引的 pandas DataFrame,列时间。我可以通过使用.to_dataframe()然后.unstack()在结果数据帧上来做到这一点:

>>> weather.to_dataframe().unstack()
           weather        
time     Afternoon Morning
date                      
Friday       Foggy   Rainy
Thursday     Windy   Sunny

但是,熊猫会对事情进行排序,而不是 Morning 之后是 Afternoon,我得到 Afternoon 之后是 Morning。我宁愿希望有一个类似的 API

weather.to_dataframe(index_dims=[...], column_dims=[...])

这可以为我进行这种重塑,而无需我事后重新排序我的索引和列。

标签: pandaspython-xarray

解决方案


在 xarray 0.16.1 中,dim_order被添加到.to_dataframe. 这是否符合您的要求?

xr.DataArray.to_dataframe(
    self,
    name: Hashable = None,
    dim_order: List[Hashable] = None,
) -> pandas.core.frame.DataFrame
Docstring:
Convert this array and its coordinates into a tidy pandas.DataFrame.

The DataFrame is indexed by the Cartesian product of index coordinates
(in the form of a :py:class:`pandas.MultiIndex`).

Other coordinates are included as columns in the DataFrame.

Parameters
----------
name
    Name to give to this array (required if unnamed).
dim_order
    Hierarchical dimension order for the resulting dataframe.
    Array content is transposed to this order and then written out as flat
    vectors in contiguous order, so the last dimension in this list
    will be contiguous in the resulting DataFrame. This has a major
    influence on which operations are efficient on the resulting
    dataframe.

    If provided, must include all dimensions of this DataArray. By default,
    dimensions are sorted according to the DataArray dimensions order.

推荐阅读