首页 > 解决方案 > Azure 服务总线高级版 - 异地恢复 - 丢弃的消息

问题描述

我在一个区域有一个高级服务总线命名空间,并在另一个区域创建了另一个,作为辅助。我已在主服务器上启用地理恢复,并已配置与辅助服务器的配对。我运行了一个测试以连续向主题发送消息,并且我有一个订阅应用程序的接收应用程序。发送方将发送“发送消息:消息{序列号}”,接收方将显示“接收消息:序列号:{SB 分配的序列号}正文:消息{序列号}”。但是,当我尝试通过门户启动到辅助节点的故障转移时,我注意到虽然发送方继续发送消息,但接收方在完成故障转移时丢弃了一些消息。请看下面:

发件人的日志:

Sending message: Message 244
Sending message: Message 245
Sending message: Message 246
Sending message: Message 247
Sending message: Message 248
Sending message: Message 249
Sending message: Message 250
Sending message: Message 251
Sending message: Message 252
Sending message: Message 253
Sending message: Message 254
Sending message: Message 255
Sending message: Message 256
Sending message: Message 257
Sending message: Message 258
Sending message: Message 259
Sending message: Message 260

来自接收方的日志:

Received message: SequenceNumber:255 Body:Message 244
Received message: SequenceNumber:256 Body:Message 245
Received message: SequenceNumber:257 Body:Message 246
Received message: SequenceNumber:258 Body:Message 247
Message handler encountered an exception Microsoft.Azure.ServiceBus.UnauthorizedException: Connection rejected after GeoDRFailOver. TrackingId:7bb0b78d-2bf5-4807-8bcb-c831b00c6692, SystemTracker:AmqpGatewayProvider, Timestamp:2019-08-12T17:42:38
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<OnReceiveAsync>d__86.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass64_0.<<ReceiveAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.RetryPolicy.<RunOperation>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Microsoft.Azure.ServiceBus.RetryPolicy.<RunOperation>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<ReceiveAsync>d__64.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<ReceiveAsync>d__62.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>d__11.MoveNext().
Exception context for troubleshooting:
- Endpoint: rjpremium.servicebus.windows.net
- Entity Path: topic1/Subscriptions/sub1
- Executing Action: Receive
Received message: SequenceNumber:1 Body:Message 254
Received message: SequenceNumber:2 Body:Message 255
Received message: SequenceNumber:3 Body:Message 256
Received message: SequenceNumber:4 Body:Message 257
Received message: SequenceNumber:5 Body:Message 258
Received message: SequenceNumber:6 Body:Message 259
Received message: SequenceNumber:7 Body:Message 260

丢弃 247 和 254 之间的消息。尽管发送者发送了所有这些消息,但接收者从未收到这些消息。如果我启用 Geo-Recovery,接收方也应该收到这些消息吗?

标签: azureazureservicebusazure-servicebus-queues

解决方案


首先,引用文档(正如肖恩·费尔德曼(Sean Feldman)同时指出的那样)

“地理灾难恢复当前仅确保元数据(队列、主题、订阅、过滤器)在配对时从主命名空间复制到辅助命名空间。”

这意味着消息(尚未)被复制。

其次,GeoDR 适用于您必须搬出该地区的罕见情况,因为那里的某些东西以某种方式完全损坏了。这意味着您上面的测试场景不太可能反映任何现实。您将面临中断的危机情况,然后非常慎重地放弃该区域并进行故障转移,不仅对服务总线而且对您拥有的所有其他东西进行故障转移。


推荐阅读