首页 > 解决方案 > 压碎后MongoDB服务器未启动

问题描述

我有一个 MongoDB 4.0.5 的单服务器配置,在 Amazon AMI 上运行。由于一些繁重的读写负载,它被压碎了(相信是因为内存不足错误)。自粉碎以来,任何恢复 mongo 服务器的尝试都失败了。

在 mongod.log 文件中,以下输出被一遍又一遍地写入(因为服务器尝试在循环中启动自身):

...
2020-06-24T16:38:23.636+0000 I STORAGE  [initandlisten] Detected data files in /media/mongovol/data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2020-06-24T16:38:23.637+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7277M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2020-06-24T16:38:24.285+0000 I STORAGE  [initandlisten] WiredTiger message [1593016704:285413][4793:0x7f6559820b40], txn-recover: Main recovery loop: starting at 9198/127616 to 9199/256
2020-06-24T16:38:24.285+0000 I STORAGE  [initandlisten] WiredTiger message [1593016704:285682][4793:0x7f6559820b40], txn-recover: Recovering log 9198 through 9199
2020-06-24T16:38:24.341+0000 I STORAGE  [initandlisten] WiredTiger message [1593016704:341106][4793:0x7f6559820b40], txn-recover: Recovering log 9199 through 9199
2020-06-24T16:38:24.387+0000 I STORAGE  [initandlisten] WiredTiger message [1593016704:387027][4793:0x7f6559820b40], txn-recover: Set global recovery timestamp: 5ef2be9d0000001e
2020-06-24T16:38:24.397+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1592966813, 30)
2020-06-24T16:38:24.397+0000 I STORAGE  [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1592966813, 30) PrevStable: Timestamp(0, 0) CurrStable: Timestamp(1592966813, 30)
2020-06-24T16:38:24.408+0000 I STORAGE  [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2020-06-24T16:38:24.408+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 10031 records totaling to 6874042080 bytes
2020-06-24T16:38:24.408+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2020-06-24T16:38:31.032+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/media/mongovol/data/diagnostic.data'
2020-06-24T16:38:31.033+0000 I REPL     [initandlisten] Rollback ID is 1
2020-06-24T16:38:31.033+0000 I REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1592966813, 30) (top of oplog: { ts: Timestamp(1592967376, 1), t: 8 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2020-06-24T16:38:31.033+0000 I REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1592966813, 30)
2020-06-24T16:38:31.033+0000 I REPL     [initandlisten] Replaying stored operations from { : Timestamp(1592966813, 30) } (exclusive) to { : Timestamp(1592967376, 1) } (inclusive).
2020-06-24T16:38:32.070+0000 I FTDC     [ftdc] Unclean full-time diagnostic data capture shutdown detected, found interim file, some metrics may have been lost. OK
2020-06-24T16:39:53.692+0000 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2020-06-24T16:39:53.692+0000 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2020-06-24T16:39:53.692+0000 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2020-06-24T16:39:53.692+0000 I REPL     [signalProcessingThread] shutting down replication subsystems
2020-06-24T16:41:24.479+0000 I CONTROL  [main] ***** SERVER RESTARTED *****

在 db 文件夹上运行mongod --repair成功结束,但服务器仍然无法启动并出现相同的错误。

我怎样才能恢复这台服务器?

标签: mongodb

解决方案


在评论之后,我尝试通过直接运行mongod进程来手动启动服务器,而不是使用systemctl启动它。过了一会儿,数据库恢复了,我可以停止它并使用 systemctl 重新启动它。似乎在暗恋后尝试将服务器作为服务启动时可能没有足够的时间来正确恢复,但是一旦恢复,就可以像往常一样继续启动它。


推荐阅读