postgresql - 从分区表中删除查询失败后处于恢复模式的 Postgres (PG 12)
问题描述
我有一个代码曾经在一个简单的表上工作,当同一个表被分区为许多子分区时停止工作。
在分布式应用程序 (Spark) 中,我们的代码可以同时从不同的计算机并行执行批量删除查询(删除不同的记录)。
大多数查询都有效,但其中一个查询似乎是套接字连接超时而失败:
java.sql.BatchUpdateException: Batch entry 0 DELETE FROM my_table WHERE vessel_id='xxxxxx' AND day='2020-09-15 00:00:00+00'::timestamp was aborted: An I/O error occurred while sending to the backend. Call getNextException to see other errors in the batch.
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
When the code retries to run the task the connection fails on
:FATAL: the database system is in recovery mode
在数据库日志中,我看到:
2020-09-21 16:44:27 UTC::@:[26848]:DETAIL: Failed process was running: DELETE FROM my_table WHERE vessel_id=$1 AND day=$2
2020-09-21 16:44:27 UTC::@:[26848]:LOG: terminating any other active server processes
2020-09-21 16:44:27 UTC:172.31.4.110(59468):postgres@postgres:[27705]:WARNING: terminating connection because of crash of another server process
2020-09-21 16:44:27 UTC:172.31.4.110(59468):postgres@postgres:[27705]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-21 16:44:27 UTC:172.31.4.110(59468):postgres@postgres:[27705]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-21 16:44:27 UTC:10.3.1.138(57926):rdsrepladmin@[unknown]:[26740]:WARNING: terminating connection because of crash of another server process
2020-09-21 16:44:27 UTC:10.3.1.138(57926):rdsrepladmin@[unknown]:[26740]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-21 16:44:27 UTC:10.3.1.138(57926):rdsrepladmin@[unknown]:[26740]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-21 16:44:27 UTC::@:[22480]:WARNING: terminating connection because of crash of another server process
2020-09-21 16:44:27 UTC::@:[22480]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-21 16:44:27 UTC::@:[22480]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-21 16:44:27 UTC:127.0.0.1(31826):rdsadmin@rdsadmin:[27967]:FATAL: the database system is in recovery mode
任何想法为什么在表分区时数据库失败?为什么其他计算机上的所有其他连接都关闭并且数据库进入恢复模式?
解决方案
查看日志后,我发现问题是内存不足。这个数据库实例是主实例,它负责写入、复制和删除,它没有足够的内存来同时处理所有这些任务。
解决方法只是添加更多内存。没有什么花哨。
推荐阅读
- angular - InvalidValueError:在属性来源中:不是字符串;而不是 LatLng 或 LatLngLiteral:不是对象;而不是一个对象
- wpf - WPF - 文本框的 SuggestAppend
- powershell - PowerShell Remove-Item - 字符串后的通配符?
- python - sqlalchemy 可以添加两个特定列不能同时为 None 的字段级规则吗?
- mysql - 更改 apache 的端口后,mysql 在 xampp 中运行正常,但 myphpadmin 显示这些错误
- variables - 我们可以在 gitlab-ci.yml 中使用动态作业名称吗?
- c# - 从另一个窗口句柄获取实时消息
- python - Rethinkdb: Changefeed -> 如何查看/返回 2 个字段?
- python - 如何正确运行 pyqt5 应用程序?
- jakarta-ee - 修改 Web 服务调用中的 HTTP 标头