c# - 与 chrome > 查看页面源相比,为什么 HttpWebRequest 返回显着不同的 html 源?
问题描述
我需要尽可能接近 chrome 或其他浏览器的正常页面视图源获取 HTML 源代码。但以下代码为相同的 URL 返回不同的代码。
String url = @"https://m.facebook.com";
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
readStream = new StreamReader(receiveStream);
else
readStream = new StreamReader(receiveStream,
Encoding.GetEncoding(response.CharacterSet.Replace("\"", string.Empty)));
//readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
//string[] sps = data.Split(new string[] { @"videoId"":""" }, StringSplitOptions.RemoveEmptyEntries);
}
}
catch (Exception ex)
{}
它返回以下内容:
<!DOCTYPE html>
<html lang="en" id="facebook" class="no_js">
<head><meta charset="utf-8" /><meta name="referrer" content="default" id="meta_referrer" /><script nonce="EW2LyNr7">window._cstart=+new Date();</script><script nonce="EW2LyNr7">function envFlush(a){function b(b){for(var c in a)b[c]=a[c]}window.requireLazy?window.requireLazy(["Env"],b):(window.Env=window.Env||{},b(window.Env))}envFlush...
但是浏览器源是..
<!DOCTYPE html><html><head><script id="u_0_2" nonce="db1veTby">"use strict";window.MPageLoadClientMetrics=function(){var a=+new Date(),b={prelude_onload:["jewels_visible","first_paint","visibility_change","tti"],nav_started:["first_paint","visibility_change","prelude_onload"],first_paint:["jewels_visible","visibility_change","prelude_onload"],jewels_visible:["tti","visibility_change","navigation","prelude_onload"],tti:["e2e","visibility_change","navigation"]},c=3,d=3,e="nav_started",f=!0,g="",h="",i=1,j="",k="",l="",m=function(){},n=!0,o=!1,p=!1,q=[],r=window.performance||window.msPerformance||window.webkitPerformance||{},s=(window.requestAnimationFrame||window.webkitRequestAnimationFrame||window.mozRequestAnimationFrame||window.oRequestAnimationFrame||window.msRequestAnimationFrame||window.setTimeout).bind(window),t=window.location.origin||window.location.protocol+"//"+window.location.hostname+(window.location.port&&":"+window.location.port);function u(b,c,d,e,f,i){r.timing&&r.timing.navigationStart&&(a=r.timing.navigationStart),j=b,k=c,l=d,g=e,h=f,n=i,x()}function v(a){var c=b[e];return c&&c.indexOf(a)!==-1}function w(a){return!b[a]}function x(){var a,b;do
如何获得类似的代码来查看 chrome 的页面源代码?
解决方案
Facebook 需要User-Agent
正确显示页面,否则会重定向到/unsupportedbrowser
页面。
这是使用示例HttpClient
class Program
{
private static readonly HttpClient client = new HttpClient();
static async Task Main(string[] args)
{
client.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36");
string result = await client.GetStringAsync("https://m.facebook.com");
Console.WriteLine(result);
Console.ReadKey();
}
}
输出与 Google Chrome 中的完全相同。
推荐阅读
- matplotlib - 卡方检验给出错误的结果。我应该拒绝提议的分配吗?
- android-studio - 避免在 android 上重新加载 arcgis 地图
- security - 我需要在 GCP Cloud Function Security 的 Web 请求中添加什么?
- javascript - 如何将nodejs应用程序连接到谷歌云mysql(无bs)
- mysql - 从另一个表计算所有
- angular - Angular 第一个加载路线在 CefSharp 中不起作用
- c# - ListBox.DisplayMember 和内部属性
- web-hosting - 如何仅使用端口将主机名映射到 localhost?
- c - 按钮激活的 UART 消息
- node.js - 从nodejs发送时Web客户端未收到404