首页 > 解决方案 > 如何让 puppeteer 加载网站更快?

问题描述

所以我正在和 puppeteer 一起工作来自动化东西,它工作正常,但是当我加载网站时,它比我的普通网站需要更多的时间来加载,我试着用这个做缓存

const puppeteer = require('puppeteer');
let time = new Date()
async function test() {
    const browser = await puppeteer.launch({
        headless: true, 
       executablePath:"D:\\Desktop\\node_modules\\puppeteer\\.local-chromium\\win64-848005\\chrome-win\\chrome.exe",
        args: ['--no-sandbox'], 
    });
    const page = await browser.newPage();
    const response = await page.goto('https://example.com/');
    console.log(`${new Date() -time }`)
    console.log(response);
    await browser.close();
}

它适用于example.com缓存已存储并且加载速度更快,但我的目标网站似乎不允许缓存存储

在此处输入图像描述

还有其他方法可以加快流程吗?

标签: javascriptnode.jscachingautomationpuppeteer

解决方案


如果您只是希望网站在抓取时加载得更快,并且不依赖某些图像或 javascript,则可以阻止这些资源。

按资源类型阻止

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (req) => {
    if (req.resourceType() === 'image') {
      req.abort();
    } else {
      req.continue();
    }
  });

  await page.goto('https://bbc.com');
  await page.screenshot({path: 'no-images.png', fullPage: true});
  await browser.close();
})();

按域阻止

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: true,
  });
  const page = await browser.newPage();
  const options = {
    waitUntil: 'networkidle2',
    timeout: 30000,
  };

  // Before: Normal navigtation
  await page.goto('https://theverge.com', options);
  await page.screenshot({path: 'before.png', fullPage: true});
  const metrics = await page.metrics();
  console.info(metrics);

  // After: Navigation with some domains blocked

  // Array of third-party domains to block
  const blockedDomains = [
    'https://pagead2.googlesyndication.com',
    'https://creativecdn.com',
    'https://www.googletagmanager.com',
    'https://cdn.krxd.net',
    'https://adservice.google.com',
    'https://cdn.concert.io',
    'https://z.moatads.com',
    'https://cdn.permutive.com'];
  await page.setRequestInterception(true);
  page.on('request', (request) => {
    const url = request.url();
    if (blockedDomains.some((d) => url.startsWith(d))) {
      request.abort();
    } else {
      request.continue();
    }
  });

  await page.goto('https://theverge.com', options);
  await page.screenshot({path: 'after.png', fullPage: true});

  const metricsAfter = await page.metrics();
  console.info(metricsAfter);

  await browser.close();
})();

来源: https ://github.com/addyosmani/puppeteer-webperf


推荐阅读