首页 > 解决方案 > 循环中的绘图挂起matplotlib:奇怪的内存泄漏?

问题描述

我在字典中有一堆时间戳值,例如 pandas.DataFrame -s,如下所示:

dfS[k1] = df1
dfS[k2] = df2
...

像这样绘制到同一轴时:

dfS[k1].plot(ax=ax1)
dfS[k2].plot(ax=ax1)
...

工作,在循环中相同:

for k in dfS.keys():
    dfS[k].plot(ax=ax1)

matplotlib大约 20 秒后崩溃并显示以下消息:

Traceback (most recent call last):
  File "testDataDisplay.py", line 66, in <module>
    dfS[k].plot(ax=ax)
  File "/usr/lib/python3/dist-packages/pandas/plotting/_core.py", line 847, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/__init__.py", line 61, in plot
    plot_obj.generate()
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py", line 269, in generate
    self._post_plot_logic_common(ax, self.data)
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py", line 437, in _post_plot_logic_common
    self._apply_axis_properties(ax.xaxis, rot=self.rot, fontsize=self.fontsize)
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py", line 520, in _apply_axis_properties
    labels = axis.get_majorticklabels() + axis.get_minorticklabels()
  File "/usr/lib/python3/dist-packages/matplotlib/axis.py", line 1207, in get_majorticklabels
    ticks = self.get_major_ticks()
  File "/usr/lib/python3/dist-packages/matplotlib/axis.py", line 1378, in get_major_ticks
    numticks = len(self.get_majorticklocs())
  File "/usr/lib/python3/dist-packages/matplotlib/axis.py", line 1283, in get_majorticklocs
    return self.major.locator()
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/converter.py", line 988, in __call__
    locs = self._get_default_locs(vmin, vmax)
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/converter.py", line 968, in _get_default_locs
    self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq)
  File "/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/converter.py", line 588, in _daily_finder
    info = np.zeros(
MemoryError: Unable to allocate 45.2 GiB for an array with shape (1617786887,) and data type [('val', '<i8'), ('maj', '?'), ('min', '?'), ('fmt', 'S20')]

看起来matplotlib将时间戳解释为形状,因为数据点的总数 仅为608。以下是其中的一些供参考:

dfS['pRp:1:avg5min'].head(4)
                 pRp:1:avg5min
2021-04-07 14:14:30        64.6226
2021-04-07 14:14:35        64.1258
2021-04-07 14:14:40        64.5340
2021-04-07 14:14:45        66.2782

for key in dfS.keys():
    print(key, end=' ')
    print(dfS[key].shape)

pRp:0:avg5min (5, 1)
pRp:0:raw (299, 1)
pRp:1:avg5min (5, 1)
pRp:1:raw (299, 1)

matplotlib.__version__
'3.3.0'

python3 --version
Python 3.8.6

pd.__version__
'1.0.5'

有什么建议吗?

标签: pythonpandasmatplotlibcrashtime-series

解决方案


这应该是一个评论而不是一个答案 - 但是因为 stackoverflow 决定我只能用 50 个声誉点发表评论,所以我将把它作为一个答案:

似乎您正在加载多个数据帧,每个数据帧都有自己的时间序列和多个数据列,因此达到了内存限制。

当你表演

for k in dfS.keys():
    dfS[k].plot(ax=ax1)

你绘制整个数据框。

也许这样的事情会帮助你:

for k in dfS.keys():
    dfS[k].plot(x="columnName", ax=ax1)

其中columnName表示数据框的特定列的名称。

在这里很难猜出确切的问题,因为我们不知道输入数据的样子——也许你在这里发布了数据内容的最小版本或标题版本。


推荐阅读