首页 > 解决方案 > Puppeteer:以无头模式保存 PDF 文件

问题描述

我想用无头铬和 puppeteer 实现什么:

  1. 登录某个网站
  2. 导航到 pdf 文件
  3. 下载到服务器

根据此错误,无头铬无法导航到 pdf 文件: https ://bugs.chromium.org/p/chromium/issues/detail?id=761295

所以我试图从当前的 puppeteer 会话中获取 cookie 并使用 https.get 请求传递它们,但不幸的是没有成功。

我的代码:

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://login-page', { waitUntil: 'networkidle0' });
await page.type('#email', 'email');
await page.type('#password', 'password');
await page.click('input[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle0' });

// following line throws an error with headless mode
// await page.goto('https://url-with-pdf-accessible-only-after-login');

// I'm trying to convert cookie object to cookie string to pass it with headers
const cookies = await page.cookies();
let cookieString = '';
for (index in cookies) {
  const cookie = cookies[index];
  for (key in cookie) {
    cookieString += key + '=' + cookie[key] + '; ';
  }
}

// following code save me empty file (0 bytes)
const file = fs.createWriteStream('file.pdf');
https.get({
  hostname: 'host-with-pdf-file',
  path: '/path-to-pdf-accessible-only-after-login,
  headers: {
    'Cookie': cookieString,
  }
}, res => {
  res.pipe(file);
});

难道我做错了什么?

有没有其他方法可以将 pdf 文件从 url(需要身份验证)保存到服务器?

标签: node.jspuppeteergoogle-chrome-headless

解决方案


我遇到了几乎同样的问题。

信息:我在 Windows 10 64 位、节点 v8.9.4、puppeteer 1.12.2 上运行它

更多重要信息:不适用于嵌入式“本地铬”(木偶安装的 73.0.3679.0(64 位)),但适用于已安装的 Chrome!(72.0.3626.119),所以我为启动方法实现了自定义的“executablePath”属性:)它工作!

我搜索了几个小时,所以我希望这个解决方案有用......

const puppeteer = require('puppeteer');
(async () => {
  // Custom browser, headless not present Eq to true
  const browser = await puppeteer.launch({executablePath: 'C:/\Program Files (x86)/\Google/\Chrome/\Application/\chrome.exe'});
  const page = await browser.newPage();
  //URL
  await page.goto('https://www.theUrl', {waitUntil: 'networkidle2'});
  await page.waitFor('input[name=NameOfTheLoginHtmlField]');
  await page.$eval('input[name=NameOfTheLoginHtmlField]', el => el.value = 'InputValueOfTheLoginHtmlField');
  await page.waitFor('input[name=NameOfThePasswordHtmlField]');
  await page.$eval('input[name=NameOfThePasswordHtmlField]', el => el.value = 'InputValueOfTheLoginHtmlField');
  //The submit button has been replaced by an "a" with js function behind, so ...
  await page.click('#login-submit > a');
    //Allow to define the download path ('' = current directory : C:\Program Files (x86)\Google\Chrome\Application\72.0.3626.119)
    function setDownloadBehavior(downloadPath=''){
        return page._client.send('Page.setDownloadBehavior', {
            behavior: 'allow',
            downloadPath
        });
    }
  await setDownloadBehavior();
  await page.waitFor(5000);
  await browser.close();
})()

推荐阅读