首页 > 解决方案 > 本地刮板和云端刮板返回不同的结果

问题描述

我试图抓取 instagram 并获得不同的结果。功能正在通过以下命令进行部署。当我从这个视频中尝试非常类似的功能时:https ://www.youtube.com/watch?v=i8THvr03FaY&list=WL&index=54&t=0s 一切正常。instagram imo 似乎有问题,但也许有人知道如何解决这个问题。

我的抓取片段:

const puppeteer = require('puppeteer');

let browserPromise = puppeteer.launch({
  args: ['--no-sandbox'],
});

exports.igprof = async (req, res) => {
  const username = req.query.username || 'instagram';
  let url = 'https://www.instagram.com/' + username;

  const browser = await browserPromise;
  const context = await browser.createIncognitoBrowserContext();
  const page = await context.newPage();

  await page.goto(url, { waitUntil: 'networkidle2' });
  const html = await page.evaluate(() => document.body.innerText);
  //   const json = JSON.parse(html);

  //   if (json.graphql) {
  //     console.log(json.graphql.user.profile_pic_url_hd);
  //     await page.goto(json.graphql.user.profile_pic_url_hd);
  //   }

  const image = await page.screenshot({
    clip: {
      x: 240,
      y: 140,
      width: 320,
      height: 320,
    },
  });

  //   res.setHeader('Content-Type', 'image/png');
  res.setHeader('Content-Type', 'text/html');
  //   res.send(image);
  res.send(html);

  context.close();
};

包.json

{
  "name": "ig-prof",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "start": "functions-framework --target=igprof"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@google-cloud/functions-framework": "^1.2.1",
    "puppeteer": "^1.19.1"
  }
}

部署命令:

gcloud functions deploy igprof --trigger-http --runtime=nodejs10 --memory=1024mb --region=europe-west1

结果本地主机:

instagram Verified Follow 6,462 posts 362m followers 59 following Instagram #ShareBlackStories about.instagram.com/blog/announcements/introducing-instagram-reels-announcement Made Us  Guides  Reels ⭐️ #TryThisAtHome SBS  Pride 2020  Juneteenth Self-Care Tips 2020 Vision POSTS GUIDES IGTV TAGGED Show More Posts from instagram Related Accounts See All virat.kohli Verified Virat Kohli Follow billieeilish Verified BILLIE EILISH Follow championsleague Verified UEFA Champions League Follow marshmellomusic Verified marshmello Follow carryminati Verified   Follow sachintendulkar Verified Sachin Tendulkar Follow tiktok Verified TikTok Follow nickiminaj Verified Barbie Follow leonardodicaprio Verified Leonardo DiCaprio Follow samsungwithgalaxy Verified Samsung #withGalaxy Follow sony Verified Sony Follow To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Instagram through cookies. Learn more, including about available controls: Cookies Policy. Close Search Log In to Instagram Log in to see photos and videos from friends and discover other accounts you'll love. Log In Sign Up Log In Sign Up ABOUTHELPPRESSAPIJOBSPRIVACYTERMSLOCATIONSTOP ACCOUNTSSUGGESTED ACCOUNTSHASHTAGSLANGUAGE Afrikaans Čeština Dansk Deutsch Ελληνικά English Español (España) Español Suomi Français Bahasa Indonesia Italiano 日本語 한국어 Bahasa Melayu Norsk Nederlands Polski Português (Brasil) Português (Portugal) Русский Svenska ภาษาไทย Filipino Türkçe 中文(简体) 中文(台灣) বাংলা ગુજરાતી हिन्दी Hrvatski Magyar ಕನ್ನಡ മലയാളം मराठी नेपाली ਪੰਜਾਬੀ සිංහල Slovenčina தமிழ் తెలుగు Tiếng Việt 中文(香港) Български Français (Canada) Română Српски Українська © 2020 INSTAGRAM FROM FACEBOOK

结果谷歌云

Instagram Phone number, username, or email Password Log In OR Log in with Facebook Forgot password? Don't have an account? Sign up Get the app. ABOUTHELPPRESSAPIJOBSPRIVACYTERMSLOCATIONSTOP ACCOUNTSHASHTAGSLANGUAGE Afrikaans Čeština Dansk Deutsch Ελληνικά English Español (España) Español Suomi Français Bahasa Indonesia Italiano 日本語 한국어 Bahasa Melayu Norsk Nederlands Polski Português (Brasil) Português (Portugal) Русский Svenska ภาษาไทย Filipino Türkçe 中文(简体) 中文(台灣) বাংলা ગુજરાતી हिन्दी Hrvatski Magyar ಕನ್ನಡ മലയാളം मराठी नेपाली ਪੰਜਾਬੀ සිංහල Slovenčina தமிழ் తెలుగు Tiếng Việt 中文(香港) Български Français (Canada) Română Српски Українська © 2020 INSTAGRAM FROM FACEBOOK

为什么这会返回不同的结果,我该如何解决?

标签: javascriptnode.jsgoogle-cloud-functionsinstagrampuppeteer

解决方案


Instagram 会将您重定向到登录页面,而不是显示公共个人资料,因为很可能检测到了抓取工具,并且 instagram 会阻止您故意抓取他们的数据。

虽然它在本地工作,但当您将刮板部署到云时,刮板运行的环境会发生变化。来自云服务的 IP 地址通常会被标记,这意味着 Instagram 可以告诉您从部署在谷歌云上的后端运行 puppeteer,并决定阻止这些请求。

您可以尝试的选项:

  1. 使用puppeteer 隐形插件。虽然这对于其他网站来说是一个潜在的解决方案,但对于抓取 instagram 来说它不起作用。
  2. 测试不同的云提供商(谷歌云、AWS、heroku,但最好选择更小、更未知的服务)
  3. 使用诸如brightdata.com 之类的代理服务。据我所知,他们支持 Instagram 抓取。

推荐阅读