首页 > 解决方案 > 由于 Invariant failure rs.get() src/mongo/db/catalog/database.cpp,MongoDB 修复失败

问题描述

MongoDB 版本:3.4.24
托管在 Linux 服务器上的 MongoDB 由于内存过度使用而突然关闭。
使用以下方式启动mongodb修复:sudo mongod -f /etc/mongodrepair.conf --repair
整个数据库为2.5TB,在修复/重新索引db时,成功修复了接近2.4TB,但由于Invariantfailure错误,最后972MB的DB修复失败。

修复日志

2020-07-04T17:17:07.441+0000 I INDEX    [initandlisten]          building index using bulk method; build may temporarily use up to 50 megabytes of RA$  
2020-07-04T17:17:07.448+0000 I INDEX    [initandlisten] build index on: test.summary properties: { v: 1, key: { totalVolume: -1 }, name: "totalV$
lume_-1", ns: "test.summary", background: true }
2020-07-04T17:17:07.448+0000 I INDEX    [initandlisten]          building index using bulk method; build may temporarily use up to 50 megabytes of RA$  
2020-07-04T17:17:07.456+0000 I INDEX    [initandlisten] build index on: test.summary properties: { v: 1, key: { ts: -1 }, name: "ts_-1", ns: "test.summary", background: true }  
2020-07-04T17:17:07.456+0000 I INDEX    [initandlisten]          building index using bulk method; build may temporarily use up to 50 megabytes of RA$  
2020-07-04T17:17:08.673+0000 I -        [initandlisten]  Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195    
2020-07-04T17:17:08.673+0000 I -        [initandlisten]   
  
***aborting after invariant() failure  
  
  
2020-07-04T17:17:08.717+0000 F -        [initandlisten] Got signal: 6 (Aborted).  

重启日志

2020-07-04T17:39:14.476+0000 I CONTROL  [main] ***** SERVER RESTARTED *****  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten] MongoDB starting : pid=20485 port=27017 dbpath=/home/db324 64-bit host=ip-*-*-*-*  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten] db version v3.4.24  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten] allocator: tcmalloc  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten] modules: none  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten] build environment:  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten]     distarch: x86_64  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten]     target_arch: x86_64  
2020-07-04T17:39:14.480+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "-.-.-.-", port: 27017 }, replication: $  
 oplogSizeMB: 10240, replSetName: "rs1" }, storage: { dbPath: "/home/db324", directoryPerDB: true, engine: "wiredTiger", journal: { enabled$
 true }, wiredTiger: { engineConfig: { cacheSizeGB: 108.0 } } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.lo$
" } }  
2020-07-04T17:39:14.480+0000 W -        [initandlisten] Detected unclean shutdown - /home/db324/mongod.lock is not empty.  
2020-07-04T17:39:14.499+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.  
2020-07-04T17:39:14.499+0000 I STORAGE  [initandlisten]   
2020-07-04T17:39:14.499+0000 I STORAGE  [initandlisten] ** WARNING: The configured WiredTiger cache size is more than 80% of available RAM.  
2020-07-04T17:39:14.499+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=110592M,session_max=20000,eviction=(threads_min=4,t$
reads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000)$
checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),  
2020-07-04T17:39:14.667+0000 I STORAGE  [initandlisten] WiredTiger message [1593884354:667272][20485:0x7fdbcb287580], txn-recover: Main recovery loop$
 starting at 73368/128  
2020-07-04T17:39:14.667+0000 I STORAGE  [initandlisten] WiredTiger message [1593884354:667951][20485:0x7fdbcb287580], txn-recover: Recovering log 733$
8 through 73369  
2020-07-04T17:39:14.733+0000 I STORAGE  [initandlisten] WiredTiger message [1593884354:733044][20485:0x7fdbcb287580], txn-recover: Recovering log 733$
9 through 73369  
2020-07-04T17:39:15.164+0000 E STORAGE  [initandlisten] WiredTiger error (-31802) [1593884355:164908][20485:0x7fdbcb287580], test/collectio$
-56-3854974571131417844.wt, WT_SESSION.open_cursor: /home/db324/test/collection-56-3854974571131417844.wt: handle-read: pread: failed $
to read 4096 bytes at offset 28672: WT_ERROR: non-specific WiredTiger error  
2020-07-04T17:39:15.164+0000 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredT$
ger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 113  
2020-07-04T17:39:15.165+0000 I -        [initandlisten]   
  
***aborting after invariant() failure  

有没有办法修复/恢复数据库的最后一部分?或者
有没有办法忽略损坏的数据库?或者
是否可以在没有最后一个错误数据库的情况下删除整个 2.4TB 数据并创建一个 2.4TB 的新 MongoDB 实例?

我将非常感谢您的帮助。
提前致谢

标签: mongodb

解决方案


修复日志表明它未能在 上建立索引ns: "test.summary"

另一个日志为您提供文件名和错误的偏移量:

/home/db324/test/collection-56-3854974571131417844.wt:handle-read:pread:在偏移量 28672 处读取 4096 个字节失败 $

文件中该点之后的数据可能无法挽救。你可以试试:

  1. 备份现有文件
  2. 删除文件 /home/db324/test/collection-56-3854974571131417844.wt
  3. 在这个 dbpath 上重新运行 mongod --repair

如果一切顺利,它将为该集合创建一个新的空文件。

如果您需要尝试挽救该数据,则在上述成功后,您知道其余数据文件是一致的,然后从备份中重新复制该文件并再次尝试修复。


推荐阅读