python - 我可以找出导致我的 Python MemoryError 的分配请求吗?
问题描述
语境
我的小型 Python 脚本使用一个库来处理一些相对较大的数据。此任务的标准算法是动态规划算法,因此可能“幕后”库分配了一个大数组来跟踪 DP 的部分结果。事实上,当我尝试给它相当大的输入时,它会立即给出一个MemoryError
.
最好不要深入研究库的深度,我想弄清楚是否值得在具有更多内存的另一台机器上尝试这个算法,或者尝试减少我的输入大小,或者它是否是一个失败的原因我正在尝试使用的数据大小。
问题
当我的 Python 代码抛出 a 时MemoryError
,我是否有一种“自上而下”的方式来调查我的代码尝试分配导致错误的内存大小,例如通过检查错误对象?
解决方案
You can't see from the MemoryError
exception, and the exception is raised for any situation where memory allocation failed, including Python internals that do not directly connect to code creating new Python data structures; some modules create locks or other support objects and those operations can fail due to memory having run out.
You also can't necessarily know how much memory would be required to have the whole operation succeed. If the library creates several data structures over the course of operation, trying to allocate memory for a string used as a dictionary key could be the last straw, or it could be copying the whole existing data structure for mutation, or anything in between, but this doesn't say anything about how much memory is going to be needed, in addition, for the remainder of the process.
That said, Python can give you detailed information on what memory allocations are being made, and when, and where, using the tracemalloc
module. Using that module and an experimental approach, you could estimate how much memory your data set would require to complete.
The trick is to find data sets for which the process can be completed. You'd want to find data sets of different sizes, and you can then measure how much memory those data structures require. You'd create snapshots before and after with tracemalloc.take_snapshot()
, compare differences and statistics between the snapshots for those data sets, and perhaps you can extrapolate from that information how much more memory your larger data set would need. It depends, of course, on the nature of the operation and the datasets, but if there is any kind of pattern tracemalloc
is your best shot to discover it.
推荐阅读
- java - AJP 连接器和 Tomcat 8.5.54 之间的网关超时问题
- postgresql - 行级安全性的安全定义器函数与视图
- reactjs - 无法让 onClick={} 使用 .map() 生成的组件
- c# - 更改 IdentityServer4.Admin 的 STS 项目的站点语言的问题(Skoruba 的项目)
- aspnetboilerplate - 在 AspNetBoilerplate 中实现特征管理
- javascript - 带有读取文本文件的 eventEmiter
- boto3 - Boto3 EC2 客户端 describe_images 方法文档说它返回图像的 PlatformDetails 和 UsageOperation,但它没有
- karate - 连续两次进行相同的调用不会更新响应
- python - 通过拆分 python 列表中的每个项目填充的字典键和值
- python - 使用python将带有微秒的字符串转换为日期时间对象