首页 > 解决方案 > 读取来自 SslStream 的响应时不能信任 Content-Length?

问题描述

在 .NET Core 2.2 上使用 TcpClient 和 NetworkStream。
试图从https://www.google.com/获取内容

在继续之前,我想明确说明我不想使用 WebClient、HttpWebRequest 或 HttpClient 类。有很多问题是人们在使用 TcpClient 时遇到了一些问题,并且响应者或评论者建议在此任务中使用其他东西,所以请不要。

假设我们有一个从 TcpClient 的 NetworkStream 获得并经过适当身份验证的 SslStream 实例。

假设还有一个StreamWriter用于将 HTTP 消息写入此流,另一个StreamReader用于从响应中读取 HTTP 消息头:

var tcpClient = new TcpClient("google.com", 443);
var stream = tcpClient.GetStream();
var sslStream = new SslStream(stream, false);
sslStream.AuthenticateAsClient("google.com");
var streamWriter = new StreamWriter(sslStream);
var streamReader = new StreamReader(sslStream);

假设我们发送请求的方式与 Firefox 浏览器发送请求的方式相同:

GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0

这会导致发送以下响应:

HTTP/1.1 200 OK
Date: Sun, 28 Apr 2019 17:28:27 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=31536000
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Content-Encoding: br
Server: gws
Content-Length: 55786
... etc

现在,在使用streamReader.ReadLine()并解析响应头中找到的内容长度读取所有响应头之后,让我们将响应内容读入缓冲区:

var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
    bytesRead = sslStream.Read(buffer,
        totalBytesRead,
        contentLength - totalBytesRead);
    totalBytesRead += bytesRead;
} while (totalBytesRead < contentLength && bytesRead > 0);

但是,这个do..while循环只有在远程服务器关闭连接后才会退出,这意味着最后一次调用Read将挂起。这意味着我们已经阅读了整个响应内容,并且服务器已经在此流上侦听另一个 HTTP 消息。不contentLength正确吗?调用时是否streamReader读取过多ReadLine,因此是否会弄乱SslStream位置,从而导致读取无效数据?

是什么赋予了?有没有人有这方面的经验?

PS 这是一个示例控制台应用程序代码,省略了所有安全检查,它演示了这一点:

private static void Main(string[] args)
{
    using (var tcpClient = new TcpClient("google.com", 443))
    {
        var stream = tcpClient.GetStream();
        using (var sslStream = new SslStream(stream, false))
        {
            sslStream.AuthenticateAsClient("google.com");
            using (var streamReader = new StreamReader(sslStream))
            using (var streamWriter = new StreamWriter(sslStream))
            {
                streamWriter.WriteLine("GET / HTTP/1.1");
                streamWriter.WriteLine("Host: www.google.com");
                streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
                streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
                streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
                streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
                streamWriter.WriteLine("Connection: keep-alive");
                streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
                streamWriter.WriteLine("Cache-Control: max-age=0");
                streamWriter.WriteLine();
                streamWriter.Flush();

                var lines = new List<string>();
                var line = streamReader.ReadLine();
                var contentLength = 0;
                while (!string.IsNullOrWhiteSpace(line))
                {
                    var split = line.Split(": ");
                    if (split.First() == "Content-Length")
                    {
                        contentLength = int.Parse(split[1]);
                    }

                    lines.Add(line);
                    line = streamReader.ReadLine();
                }

                var totalBytesRead = 0;
                int bytesRead;
                var buffer = new byte[contentLength];
                do
                {
                    bytesRead = sslStream.Read(buffer,
                        totalBytesRead,
                        contentLength - totalBytesRead);
                    totalBytesRead += bytesRead;
                    Console.WriteLine(
                        $"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
                } while (totalBytesRead < contentLength && bytesRead > 0);

                Console.WriteLine(
                    "--------------------");
            }
        }
    }

    Console.ReadLine();
}

编辑

这总是在我提交问题后发生。几天来我一直在摸不着头脑,但无法找到问题的原因,但是一旦提交,我就知道这与StreamReader尝试阅读一行时搞砸了有关。

因此,如果我停止使用StreamReader并用逐字节读取的内容替换调用ReadLine,一切似乎都很好。替换代码可以写成如下:

private static IEnumerable<string> ReadHeader(Stream sslStream)
{
    // One-byte buffer for reading bytes from the stream
    var buffer = new byte[1];

    // Initialize a four-character string to keep the last four bytes of the message
    var check = new StringBuilder("....");
    int bytes;
    var responseBuilder = new StringBuilder();
    do
    {
        // Read the next byte from the stream and write in into the buffer
        bytes = sslStream.Read(buffer, 0, 1);
        if (bytes == 0)
        {
            // If nothing was read, break the loop
            break;
        }

        // Add the received byte to the response builder.
        // We expect the header to be ASCII encoded so it's OK to just cast to char and append
        responseBuilder.Append((char) buffer[0]);

        // Always remove the first char from the string and append the latest received one
        check.Remove(0, 1);
        check.Append((char) buffer[0]);

        // \r\n\r\n marks the end of the message header, so break here
        if (check.ToString() == "\r\n\r\n")
        {
            break;
        }
    } while (bytes > 0);

    var headerText = responseBuilder.ToString();
    return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}

...这将使我们的示例控制台应用程序看起来像这样:

private static void Main(string[] args)
{
    using (var tcpClient = new TcpClient("google.com", 443))
    {
        var stream = tcpClient.GetStream();
        using (var sslStream = new SslStream(stream, false))
        {
            sslStream.AuthenticateAsClient("google.com");
            using (var streamWriter = new StreamWriter(sslStream))
            {
                streamWriter.WriteLine("GET / HTTP/1.1");
                streamWriter.WriteLine("Host: www.google.com");
                streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
                streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
                streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
                streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
                streamWriter.WriteLine("Connection: keep-alive");
                streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
                streamWriter.WriteLine("Cache-Control: max-age=0");
                streamWriter.WriteLine();
                streamWriter.Flush();

                var lines = ReadHeader(sslStream);
                var contentLengthLine = lines.First(x => x.StartsWith("Content-Length"));
                var split = contentLengthLine.Split(": ");
                var contentLength = int.Parse(split[1]);

                var totalBytesRead = 0;
                int bytesRead;
                var buffer = new byte[contentLength];
                do
                {
                    bytesRead = sslStream.Read(buffer,
                        totalBytesRead,
                        contentLength - totalBytesRead);
                    totalBytesRead += bytesRead;
                    Console.WriteLine(
                        $"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
                } while (totalBytesRead < contentLength && bytesRead > 0);

                Console.WriteLine(
                    "--------------------");
            }
        }
    }

    Console.ReadLine();
}

private static IEnumerable<string> ReadHeader(Stream sslStream)
{
    // One-byte buffer for reading bytes from the stream
    var buffer = new byte[1];

    // Initialize a four-character string to keep the last four bytes of the message
    var check = new StringBuilder("....");
    int bytes;
    var responseBuilder = new StringBuilder();
    do
    {
        // Read the next byte from the stream and write in into the buffer
        bytes = sslStream.Read(buffer, 0, 1);
        if (bytes == 0)
        {
            // If nothing was read, break the loop
            break;
        }

        // Add the received byte to the response builder.
        // We expect the header to be ASCII encoded so it's OK to just cast to char and append
        responseBuilder.Append((char)buffer[0]);

        // Always remove the first char from the string and append the latest received one
        check.Remove(0, 1);
        check.Append((char)buffer[0]);

        // \r\n\r\n marks the end of the message header, so break here
        if (check.ToString() == "\r\n\r\n")
        {
            break;
        }
    } while (bytes > 0);

    var headerText = responseBuilder.ToString();
    return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}

标签: c#https.net-coretcpclientsslstream

解决方案


标题中问题的答案是YES
它是可以信任的,只要你正确阅读了消息头,即不要使用StreamReader.ReadLine.

这是一个可以完成这项工作的实用方法:

private static string ReadStreamUntil(Stream stream, string boundary)
{
    // One-byte buffer for reading bytes from the stream
    var buffer = new byte[1];

    // Initialize a string builder with some placeholder chars of the length as the boundary
    var boundaryPlaceholder = string.Join(string.Empty, boundary.Select(x => "."));
    var check = new StringBuilder(boundaryPlaceholder);
    var responseBuilder = new StringBuilder();
    do
    {
        // Read the next byte from the stream and write in into the buffer
        var byteCount = stream.Read(buffer, 0, 1);
        if (byteCount == 0)
        {
            // If nothing was read, break the loop
            break;
        }

        // Add the received byte to the response builder.
        responseBuilder.Append((char)buffer[0]);

        // Always remove the first char from the string and append the latest received one
        check.Remove(0, 1);
        check.Append((char)buffer[0]);

        // boundary marks the end of the message, so break here
    } while (check.ToString() != boundary);

    return responseBuilder.ToString();
}

然后,要读取标题,我们可以调用ReadStreamUntil(sslStream, "\r\n\r\n").

这里的关键是逐字节读取流,直到遇到已知的字节序列(在这种情况下为 \r\n\r\n)。

使用此方法读取后,流将位于正确的位置,以便正确读取响应内容。

如果有任何好处,可以通过调用await ReadAsync而不是Read.

值得注意的是,上述方法只有在文本是 ASCII 编码的情况下才能正常工作。


推荐阅读