首页 > 解决方案 > 如何让 puppeteer 在 Google Cloud Run/Cloud Build 中工作?

问题描述

我有一个包含 puppeteer 网络刮板的 docker 图像。当我构建和运行它时,它可以在我的本地机器上完美运行。它还在云构建中构建良好,部署到云运行并启动 http 服务器。但是,当我运行一个处理 puppeteer 实例的 cron 作业时,它会超时并显示以下错误消息:

(node:13) UnhandledPromiseRejectionWarning: TimeoutError: Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r706915

完整日志:

A 2019-12-03T15:12:27.748625Z (node:13) UnhandledPromiseRejectionWarning: TimeoutError: Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r706915 
A 2019-12-03T15:12:27.748692Z     at Timeout.onTimeout (/node_modules/puppeteer/lib/Launcher.js:359:14) 
A 2019-12-03T15:12:27.748705Z     at ontimeout (timers.js:436:11) 
A 2019-12-03T15:12:27.748716Z     at tryOnTimeout (timers.js:300:5) 
A 2019-12-03T15:12:27.748726Z     at listOnTimeout (timers.js:263:5) 
A 2019-12-03T15:12:27.748734Z     at Timer.processTimers (timers.js:223:10) 

此错误直接发生在 puppeteerpuppeteer.launch()函数上。

我试图增加实例中的内存、不同的 dockerfile 设置(全部来自谷歌搜索)、不同的 puppeteer 实例参数并尝试在 prod 中捕获。

我将其用作基本 docker 映像(https://github.com/buildkite/docker-puppeteer),但它不起作用,所以我决定根据自己的喜好对其进行修改,这就是我到目前为止所拥有的:

Dockerfile

FROM node:10.15

RUN apt-get update && apt-get install -y wget --no-install-recommends \
  && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
  && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
  && apt-get update \
  && apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \
  --no-install-recommends \
  && rm -rf /var/lib/apt/lists/* \
  && apt-get purge --auto-remove -y curl \
  && rm -rf /src/*.deb

# RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
# RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install

# Copy package.json to docker image
COPY package.json ./

RUN npm install

# Copy source code of dir to image
COPY . .


ARG DOCKER_ENV
ENV NODE_ENV=${DOCKER_ENV}


EXPOSE 8080

CMD [ "npm", "run", "prod" ]

openBrowserInstance.js

const randomUserAgent = require(__dirname + '/randomUserAgent');
const randomProxy = require(__dirname + '/../multiple/randomProxy');
const puppeteer = require('puppeteer');

let defaultOptions = {
    blockStyleAssets: true,
    viewport: {
        width: 1920,
        height: 1080
    },
    urls: [''],
    screenshotPath: null,
    callback: null,
    randomUserAgent: true,
    randomProxy: true
};

module.exports = ( options, callback ) => {
    return new Promise( async( resolve ) => {

        options = Object.assign({}, defaultOptions, options);

        // Required options
        if ( options.urls.length < 1 || typeof callback === 'undefined' ) {
            console.log('Missing one or more required options for "openBrowserInstance.js".');
            resolve();
            return;
        }

        let browserOptions = {
            args: [`--proxy-server=http://${randomProxy()}`,'--lang=en-GB',
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--disable-dev-shm-usage'],
            headless: true
        };

        const browser = await puppeteer.launch( browserOptions );
        const page = await browser.newPage();
        await page.authenticate({username:'abrCKs', password:'ge2kCw'});

        page.viewport( options.viewport );


        if ( options.blockStyleAssets ) {

            await page.setRequestInterception(true);

            page.on('request', (req) => {

                let resourceType = req.resourceType();

                if (resourceType === 'image' || resourceType === 'stylesheet') {
                    req.abort();
                } else {
                    req.continue();
                }

            });

        }

        for (const [index, url] of options.urls.entries()) {

            let userAgent = null;

            if ( options.randomUserAgent ) {

                userAgent = randomUserAgent();

                await page.setUserAgent( userAgent );
            }

            await page.goto( url, { waitUntil: 'networkidle0' } );

            let pageContent = await page.content();

            await callback(pageContent, url, index);

            await page.close();

        }

        if ( options.screenshotPath !== null ) {
            await page.screenshot({path: screenshotPath, fullPage: true});
        }
        await browser.close();

        resolve();
    })
};


cloudbuild.yaml

steps:
- name: 'gcr.io/cloud-builders/git'
  args: ['clone', 'GIT-REPO-PLACEHOLDER']

- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '--build-arg', 'DOCKER_ENV=dev', '-t', 'eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER', '.']
  dir: 'PROJECT-NAME-PLACEHOLDER/'

- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER']

- name: 'gcr.io/cloud-builders/gcloud'
  args: ['beta', 'run', 'deploy', 'PROJECT-NAME-PLACEHOLDER', '--image', 'eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER', '--region', 'europe-west1','--platform', 'managed', '--quiet', '--memory', '2G']

images:
- eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER

如果您有任何建议,请告诉我。为此,我还研究了 Google Cloud Functions,但我不确定这是否可行。如果我找不到解决方案,我将被迫在虚拟机实例上运行它,这是一个非常有趣的循环......

感谢您的时间。

标签: node.jsdockergoogle-cloud-platformpuppeteergoogle-cloud-run

解决方案


以下是在 Cloud Run 上运行 Puppeteer 的功能齐全的示例:

screenshot.js

const puppeteer = require('puppeteer');

exports.screenshot = async (req, res) => {
    const url = req.query.url;

    if (!url) {
      return res.send('Please provide URL as GET parameter, for example: <a href="?url=https://example.com">?url=https://example.com</a>');
    }

    const browser = await puppeteer.launch({
      args: ['--no-sandbox']
    });
    const page = await browser.newPage();
    await page.goto(url);
    const imageBuffer = await page.screenshot();
    browser.close();

    res.set('Content-Type', 'image/png');
    res.send(imageBuffer);
  };

server.js

'use strict';

const {screenshot} = require('./screenshot.js')

const express = require('express');
const puppeteer = require('puppeteer');
const app = express();

app.use(screenshot);

const server = app.listen(process.env.PORT || 8080, err => {
    if (err) return console.error(err);
    const port = server.address().port;
    console.info(`App listening on port ${port}`);
  });

module.exports = app;

package.json

{
  "name": "screenshot",
  "version": "1.0.0",
  "description": "Takes screenshot of the given URL.",
  "author": "Steren",
  "scripts": {
    "start": "node server.js"
  },
  "license": "Apache-2.0",
  "dependencies": {
    "express": "^4.16.4",
    "puppeteer": "^1.10.0"
  }
}

Dockerfile

FROM node:10

# Adds required libs
RUN apt-get update && \
    apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
    libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
    libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
    libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
    ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

# Start the app
WORKDIR /usr/src/app
COPY package*.json ./
ENV NODE_ENV=production
RUN npm install --production
COPY . .
CMD [ "npm", "start" ]

推荐阅读