c# - 部分 Ping 任务从未完成
问题描述
我正在尝试 ping 特定接口上的所有链接本地 IP 地址(“169.254.xxx.yyy”;其中大约 65000 个)。我期望只有一两次成功。我想尽早测试活跃的 IP 地址(即在等待所有其他 ping 超时之前);如果一个活动地址是我想要的设备,那么我可以取消所有其他 ping。
我的应用程序是 C#、WinForms、async/await。我创建了一个List<Task<T>>
,每个都使用一个单独的 Ping 对象来探测一个特定的地址。然后我使用它Task.WhenAny
来检索结果并逐步从列表中删除相应的任务。(我还尝试了其他更有效的按顺序处理结果的方法,结果相似)。
我发现,在大约 65000 个任务中,最完整并提供了适当的结果。但是,其中有几千个(确切的数量因运行而异)保持在 WaitingForActivation 状态并且永远不会运行。只有当我将任务数量减少到大约 1300 个以下时,我才能看到所有任务都正确完成;否则,我看到其中的一小部分(大约 5-10%)仍处于 WaitingForActivation 状态。未运行的任务似乎随机分布在整个列表中。
我尝试将代码移动到控制台应用程序,结果相同。如果在每个任务中我通过调用具有相同超时的 Task.Delay() 来替换 SendPingAsync 的使用,则所有任务都按预期完成。
我的控制台测试应用程序:
using System;
using System.Collections.Generic;
using System.Net.NetworkInformation;
using System.Net.WebSockets;
using System.Threading.Tasks;
namespace AsyncPing
{
class Program
{
static ClientWebSocket _commandWebSocket;
static async Task Main(string[] args)
{
var topLevelTasks = new List<Task<ClientWebSocket>>();
// Scanning task
topLevelTasks.Add(Task.Run(async () =>
await TryToConnectLinkLocal()));
// Monitoring task (just for debugging)
topLevelTasks.Add(Task.Run(async () => await MonitorLLTasks()));
await Task.WhenAll(topLevelTasks);
}
// Monitoring Task; periodically reports on the state of the tasks collection
private static async Task<ClientWebSocket> MonitorLLTasks()
{
for (int i = 0; i < 1000; ++i)
{
await Task.Delay(1000);
int waitingForActivation = 0, waitingToRun = 0, running = 0, completed = 0;
int index = 0;
while (index < tasks.Count)
{
try
{
switch (tasks[index].Status)
{
case TaskStatus.WaitingForActivation:
waitingForActivation++;
break;
case TaskStatus.WaitingToRun:
waitingToRun++;
break;
case TaskStatus.Running:
running++;
break;
case TaskStatus.RanToCompletion:
completed++;
break;
}
++index;
}
catch
{
// Very occasionally, LLtasks[index] has been removed by the time we access it
}
}
Console.WriteLine($"There are {index} tasks: {waitingForActivation} waitingForActivation; {waitingToRun} waitingToRun; {running} running; {completed} completed. {handled} results have been handled.");
if (tasks.Count == 0)
break;
}
return null;
}
const string LinkLocalIPPrefix = "169.254.";
static List<Task<String>> tasks = new List<Task<String>>();
static int handled = 0;
private static async Task<ClientWebSocket> TryToConnectLinkLocal()
{
// Link-local addresses all start with this prefix
string baseIP = LinkLocalIPPrefix;
tasks.Clear();
handled = 0;
Console.WriteLine("Scanning Link-local addresses...");
// Scan all Link-local addresses
// We build a task for each ip address.
// Note that there are nearly 65536 addresses to ping and the tasks start running
// as soon as we start creating them.
for (int i = 1; i < 255; i++)
{
string ip_i = baseIP + i.ToString() + ".";
for (int j = 1; j < 255; j++)
{
string ip = ip_i + j.ToString();
var task = Task.Run(() => TryToConnectLinkLocal(ip));
tasks.Add(task);
}
}
while (tasks.Count > 0)
{
var t = await Task.WhenAny(tasks);
tasks.Remove(t);
String result = await t;
handled++;
}
return null;
}
private const int _pingTimeout = 10; // 10ms ought to be enough!
// Ping the specified address
static async Task<String> TryToConnectLinkLocal(string ip)
{
using (Ping ping = new Ping())
{
// This dummy code uses a fixed IP address to avoid possibility of a successful ping
var reply = await ping.SendPingAsync("169.254.123.234", _pingTimeout);
if (reply.Status == IPStatus.Success)
{
Console.WriteLine("Response at LL address " + ip);
return ip;
}
}
// Alternative: just wait for the duration of the timeout
//await Task.Delay(_pingTimeout);
return null;
}
}
}
典型的输出类似于(为简洁而编辑的类似行):
Scanning Link-local addresses...
There are 14802 tasks: 12942 waitingForActivation; 0 waitingToRun; 0 running; 1860 completed. 0 results have been handled.
There are 24623 tasks: 20005 waitingForActivation; 0 waitingToRun; 0 running; 4618 completed. 0 results have been handled.
There are 27287 tasks: 21170 waitingForActivation; 0 waitingToRun; 0 running; 6117 completed. 0 results have been handled.
There are 41714 tasks: 32471 waitingForActivation; 0 waitingToRun; 0 running; 9243 completed. 0 results have been handled.
There are 51263 tasks: 38816 waitingForActivation; 0 waitingToRun; 0 running; 12447 completed. 0 results have been handled.
There are 63891 tasks: 48403 waitingForActivation; 0 waitingToRun; 0 running; 15488 completed. 0 results have been handled.
There are 64498 tasks: 46496 waitingForActivation; 0 waitingToRun; 0 running; 18002 completed. 18 results have been handled.
<All tasks have been created. Many have been run. More and more results are handled and the corresponding tasks removed>
There are 6626 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 1084 completed. 57890 results have been handled.
There are 5542 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 0 completed. 58974 results have been handled.
There are 5542 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 0 completed. 58974 results have been handled.
There are 5542 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 0 completed. 58974 results have been handled.
There are 5542 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 0 completed. 58974 results have been handled.
There are 5542 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 0 completed. 58974 results have been handled.
There are 5542 tasks: 5542 waitingForActivation; 0 waitingToRun; 0 running; 0 completed. 58974 results have been handled.
<5542 results remain in the list because they are stuck WaitingForActivation. Only 58974 results (of 64516) have been handled. This state continues indefinitely>
我很乐意收到对此行为的解释、有关如何修复它的建议和/或有关如何以更有效的方式探测网络的建议。
我重命名了这个问题,因为我从 Stephen Cleary 的博客中了解到,这些任务可能是 Promise 任务,从WaitingForActivation
状态开始。然而,这里真正重要的是它们永远不会完成。
自原始帖子以来,我尝试了以下方法:
- 按照 Stephen Toub 的文章使用延续任务按完成顺序处理结果;
- 用于
ConcurrentExclusiveSchedulerPair
尝试限制任务的执行; - 用来
ConfigureAwait(false)
尝试改变所SynchronizationContext
用的; - 据我所知,检查没有抛出异常。
这些似乎都没有任何显着影响。
我还尝试将扫描分成多个子扫描。子扫描生成许多异步执行的 Ping 任务(根据上面的代码)。每个子扫描都是同步执行的,一个接一个。在我的机器上,只要我将 Ping 任务的数量保持在 1100 以下,它们都可以正确执行。除此之外,其中的一小部分永远不会完成。似乎这种方法并没有那么慢(大概是因为网络接口被淹没超过一定数量的同时 ping),所以它为我的问题提供了一种实用的方法。但是,为什么某些任务无法完成超过 1100 个任务的问题仍然存在。仍然:如果我通过调用来替换 Ping await Task.Delay(...)
,则所有任务都完成。
解决方案
我的建议是使用TPL Dataflow库中ActionBlock<T>
的一个。该组件将负责在发现第一个成功时及时取消该过程,同时还执行最大并发策略。IPAddress
您首先实例化一个ActionBlock<IPAddress>
,提供将为每个 运行的操作,IPAddress
以及执行选项。然后使用该Post
方法为块提供地址。Complete
然后通过调用该方法发出信号,不再发布更多地址。最后是组件await
的Completion
属性。例子:
const int pingTimeout = 10;
using var cts = new CancellationTokenSource();
IPAddress result = null;
var block = new ActionBlock<IPAddress>(async address =>
{
try
{
var reply = await new Ping().SendPingAsync(address, pingTimeout);
if (reply.Status == IPStatus.Success)
{
Interlocked.CompareExchange(ref result, address, null);
cts.Cancel();
}
}
catch (PingException) { } // Ignore
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 100, // Select a reasonable value
CancellationToken = cts.Token
});
byte b1 = 169, b2 = 254;
var addresses = Enumerable.Range(0, 255)
.SelectMany(_ => Enumerable.Range(0, 255),
(b3, b4) => new IPAddress(
new byte[] { b1, b2, (byte)b3, (byte)b4 }));
foreach (var address in addresses) block.Post(address);
block.Complete();
try { await block.Completion; }
catch (OperationCanceledException) { } // Ignore
Console.WriteLine($"Result: {result?.ToString() ?? "(not found)"}");