c# - 从 URL 获取 HTML - StreamReader 使用另一种字符编码?
问题描述
我想从这个 URL 获取 HTML:https ://store.steampowered.com/app/513710/SCUM/
这应该很容易,但由于 SSL/TLS 错误,我无法做到。
所以我使用了这个问题的代码:Requesting html over https with c# Webclient
最后我可以填充我的 StreamReader,但是当我尝试使用带有字符串的 ReadToEnd() 时,我得到一个损坏的字符串,如下所示:“�”
这一定是关于字符编码的,但如果你打开:https ://store.steampowered.com/app/513710/SCUM/
然后打开你的浏览器控制台,可以看到开头:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
在提供的代码中:
webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
你有 utf-8,所以我不知道我为什么会遇到这个问题。我试图替换:
StreamReader(webClient.OpenRead(steamURL));
和:
StreamReader(webClient.OpenRead(steamURL), Encoding.UTF8, true);
但它仍然没有得到正确的文本。我尝试添加所有我可以添加的信息,如果您需要其他任何信息,我会编辑问题。
感谢您的宝贵时间,祝您有美好的一天。
问候,
大卫
PS:这是我现在的代码:
private StreamReader getStreamReader(string steamURL, WebClient webClient)
{
return new StreamReader(webClient.OpenRead(steamURL), Encoding.UTF8, true);
}
private void getSteamCosts()
{
// When I try to access an Steam HTML, SSL error appears
// We need an specific security protocol
// I check all, just in case
ServicePointManager.ServerCertificateValidationCallback =
new RemoteCertificateValidationCallback(
delegate
{
return true;
});
using (WebClient webClient = new WebClient())
{
webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows;"
+ " U; Windows NT 6.0; en-US; rv:1.9.2.6) Gecko/20100625"
+ " Firefox/3.6.6 (.NET CLR 3.5.30729)";
webClient.Headers["Accept"] = "text/html,application/xhtml+"
+ "xml,application/xml;q=0.9,*/*;q=0.8";
webClient.Headers["Accept-Language"] = "en-us,en;q=0.5";
webClient.Headers["Accept-Encoding"] = "gzip,deflate";
webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
StreamReader sr = null;
string steamURL = "https://store.steampowered.com/app/513710/SCUM/";
try
{
// This one should work
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "TLS12Final";
}
catch (Exception) // Bad coding practice, just wanted it to work
{
// If that's not the case, I try the rest
try
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "TLSFinal";
}
catch (Exception)
{
try
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "SSL3Final";
}
catch (Exception)
{
try
{
ServicePointManager.SecurityProtocol =
SecurityProtocolType.Tls11;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "TLS11Final";
}
catch (Exception)
{
lbFinalSteam.Text = "NoFinal";
}
}
}
}
if (sr != null)
{
string allLines = sr.ReadToEnd();
}
}
}
编辑:也许问题是我如何将 StreamReader 转换为字符串?我的意思是这一行:
string allLines = sr.ReadToEnd();
我应该使用其他东西吗?
解决方案
正如https://stackoverflow.com/users/246342/alex-k已经写的那样,问题不在于编码,而是我得到了一个压缩的 Gzimp。我刚刚删除了这个:
webClient.Headers["Accept-Encoding"] = "gzip,deflate";
它有效!谢谢亚历克斯 K!:D
推荐阅读
- vue.js - 调整窗口大小时,Vue Owl Carousel 崩溃
- laravel - 三个相互关联的下拉框
- node.js - POST 请求不返回 DATA 到 Express 即服务器
- sql - 添加日期+3个工作日的功能,不包括周末和节假日
- azure - 是否可以将 MQTT 消息发送到事件中心?还是有其他方法?
- swift - 在 reactswift 中传递的观察事件
- php - 为什么只有 1 个复选框的数组值插入到表中
- mysql - 使用 group by 获取累积总和
- regex - 如何将在不同行上拆分的特定字符串/正则表达式匹配为单独的字符串?
- twitter-bootstrap - How do I install bootstrap 4 template to Ember web project?