首页 > 解决方案 > System.Net.Http.HttpRequestException 从 Azure Datalake V2 下载多个文件

问题描述

我正在从 Azure Datalake V2 下载大量文件 >1000,并且我不断收到异常:

The SSL connection could not be established, see inner exception. 
<--- Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. 
<--- An existing connection was forcibly closed by the remote host.

堆栈跟踪:

System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.FixedSizeReader.ReadPacketAsync(Stream transport, AsyncProtocolRequest request)
   at System.Net.Security.SslStream.EndProcessAuthentication(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)

编码:

var downloadTasks = job.Files.AsParallel().Select(x => Download(x));
await Task.WhenAll(downloadTasks);

private async Task Download(DownloadableFile file)
{
    try
    {
        var options = new BlobRequestOptions
        {
            ParallelOperationThreadCount = 8,
            DisableContentMD5Validation = true,
            StoreBlobContentMD5 = false
        };
        var xzBlob = await _cloudBlobFileService.GetBlockBlobReference(file.FilePath);
        await xzBlob.DownloadToFileAsync(file.LocalFilePath, FileMode.Create, null, options, null);
    }
    catch (Exception e)
    {
         _log.LogCritical(e, "Error downloading " + file.FilePath);
    }
}

我还添加了这个:

ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;
ServicePointManager.Expect100Continue = false;

使用 .Net 核心 3.1 和 WindowsAzure.Storage 9.3.3

到 webjob 中的 program.cs 主要方法

我们曾经有一个没有 datalake 的 blobstorage 配置,但在切换到 datalake 后,这种情况出现了。它不会对应用程序产生太大影响,因为稍后会重试跳过的下载。但是,很高兴知道是什么原因造成的。

标签: azure.net-coreazure-data-lake

解决方案


您可以先尝试 11 月正式发布的新存储 SDK,但我不能保证这会解决问题。这是一个完整的重写

虽然仅从错误消息中无法准确定位,但有几件事需要注意:

  1. 网络错误。这是迄今为止最可能的原因,尽管有趣的是它与您的旧 blob 存储帐户一致。增加超时可能会降低网络错误的频率,重试逻辑将有助于克服它们。
  2. 不推荐使用无限并行。ParallelOperationThreadCount用于上传而不是下载,因此在这种情况下它不会限制请求。.NET 中服务器端连接的默认限制是 10 ,建议在使用 .NET Core 时增加此限制,这是需要考虑的问题。如果您同时访问同一个 blob 或分区的次数过多,您可能会开始遇到存储中的并发连接限制

推荐阅读