python - 如何在熊猫中一次重新分配多个 MultiIndex 列?
问题描述
给定同一数据集的两个版本,一个堆叠,另一个不堆叠。
>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 47.230000 130.539993 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 46.919998 129.960007 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 46.849998 130.119995 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 46.360001 129.360001 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 46.139999 128.649994 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 183.509995 318.890015 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 181.570007 316.730011 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 181.809998 318.109985 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 181.399994 318.250000 184.149994 323.440002 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 183.250000 317.940002 184.270004 321.149994 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
>>> b = a.stack()
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 47.230000 47.770000 46.619999 47.060001 28837300.0
AAPL 120.306801 130.539993 131.389999 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 46.919998 47.349998 46.619999 46.930000 21498300.0
AAPL 119.772255 129.960007 130.660004 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 46.849998 47.740002 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 316.730011 324.239990 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 181.809998 181.990005 176.600006 180.199997 39492600.0
AAPL 318.109985 318.109985 318.709991 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 181.580002 182.470001 180.389999 180.740005 9760951.0
AAPL 319.850006 319.850006 321.070007 315.630005 316.769989 10119124.0
我正在尝试从中获取几列a
,对其进行转换并将它们重新分配给数据集。这与b
.
>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 0.000000 0.000000 46.619999 47.060001 28837300.0
AAPL 120.306801 1.763921 1.750471 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 -0.640570 -0.639623 46.619999 46.930000 21498300.0
AAPL 119.772255 1.769821 1.759451 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 -0.639504 -0.634624 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 0.744396 0.738552 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 -0.425978 -0.438718 176.600006 180.199997 39492600.0
AAPL 318.109985 0.749684 0.751250 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 -0.429191 -0.427473 180.389999 180.740005 9760951.0
AAPL 319.850006 0.761483 0.759577 315.630005 316.769989 10119124.0
[2516 rows x 6 columns]
但同样不适用于a
.
>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
self._setitem_array(key, value)
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
如果我在哪里逐列进行,这是完全可能的。我使用 for 循环作为临时解决方案,但对我来说似乎效率低下且不干净。
>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -0.006564 -0.004443 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.001492 0.001231 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 -0.010459 -0.005841 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.004745 -0.005489 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-21 183.429993 316.850006 -0.012011 -0.007455 186.669998 320.890015 183.289993 315.869995 185.399994 318.660004 29119500.0 25672200.0
2020-05-22 183.509995 318.890015 0.000436 0.006438 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -0.010572 -0.006774 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 0.001322 0.004357 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39492600.0 28211100.0
2020-05-28 183.561005 322.510010 0.009631 0.013832 183.820007 323.000000 180.389999 315.630005 180.740005 316.769989 15009134.0 16107365.0
我将其作为程序的一部分编写,该程序应该不知道列是否为 a MultiIndex
,是否有任何更清洁/更快的方法可以在不循环列的情况下做到这一点?
解决方案
对于多索引,使用 loc 方法获取结果要安全得多。
在下面的代码中,loc 专注于列(axis=0 意味着对行进行操作),并选择“关闭”和“高”。您可以安全地将替换值放在等式的另一边,并且不会出现任何错误。
我还建议阅读有关MultiIndexes的 pandas 文档以获取更多信息 - 我相信它会在使用 multiIndexes 时对您有所帮助:
a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)
推荐阅读
- java - 我应该按照 Java 约定将最终的“原子…”成员字段命名为常量(全部大写)吗?
- sass - Getting scss parse error in running storybook on svelte
- ethereum - 如何在不使用设置 msg.value 的情况下将资金从 msg.sender(amount) 转移到收件人地址
- javascript - 我需要帮助以使此导航栏在响应式折叠时全屏显示
- asp.net-core - 无法从 Blazor 调用 Javascript。在“窗口”中找不到“startDataTable”
- python - 未找到 csv 文件 Juypterlab python3
- swift - MTLSharedEventListener 块在命令缓冲区调度之前调用而不是在运行中
- excel - VBA Excel - 使用验证格式化年份范围
- webpack - 导入“事件”时,带有 Webpack 的 Electron Forge 得到“未定义要求”
- python - 当我尝试放置纹理时,它给了我这个错误: IndentationError: unindent does not match any external indentation level(python)