首页 > 解决方案 > Download file from GCF function

问题描述

I am running a NodeJS script using puppeteer in my local machine to download some assets from Internet. I wanted that script to to be running as Google Cloud function.

I just wanted to know, is there any local space associated with GFC where we can save this files and can be accessed later or can we specify any cloud storage bucket URL where this download can save.

#!/usr/bin/env node

const { program } = require('commander');
const puppeteer = require('puppeteer');

program
    .option('-e, --email <email>', 'Login Email Address', process.env.LOOKER_EMAIL || '')
    .option('-p, --password <password>', 'Login Password', process.env.LOOKER_PASSWORD || '')
    .option('-d, --dashboard <id>', 'Dashboard To Download');
program.parse(process.argv);

const fs = require('fs');
const basePath = 'C:\\card\\'

(async () => {

    const loginEmail = program.email;
    const loginPassword = program.password;
    const dashboardId = program.dashboard;

    // used puppeteer to download some files

    const browser = await puppeteer.launch({
        headless: true
    })

    let pages = await browser.pages();
    const page = await browser.newPage();

    await page.setViewport({ width: 1920, height: 1080 });
    await page.goto(loginUrl);
    await page.waitForSelector(loginEmailSelector);
    await page.type(loginEmailSelector, loginEmail);
    await page.type(loginPasswordSelector, loginPassword);
    await Promise.all([
        page.waitForNavigation(),
        page.click(loginButtonSelector)
    ]);

    await page.goto(`https://somewebsite/${dashboardId}`);
    await page.waitForSelector(menuSelector, {
        visible: true
    });
    await page.click(menuSelector);
    await page.waitForSelector(downloadSelector, {
        visible: true
    });


    const ts = Date.now()
    const downloadLoc = basePath + ts + '\\'
    console.log('downloadLoc ', downloadLoc)
    await page._client.send('Page.setDownloadBehavior', {
        behavior: 'allow',
        downloadPath: downloadLoc 
    })
    console.log(`your file's on the way!`)


})();

So here in the script I am just downloading the file in C drive, I wanted this to store in some cloud storage if possible, Please let me know if you have any suggestions.

标签: node.jsgoogle-cloud-platformpuppeteer

解决方案


Cloud Function 的概念假设代码应该是无状态的,这意味着任何数据都应该存储在外部,虽然有可能使用/tmp目录,但这只是临时目的。推荐的解决方案是 Cloud Storage(参考)。

但是,不仅可以使用 Cloud Storage 来保持状态。这在二进制对象(即文件)的情况下是最好的。

另一方面,如果这些文件包含数据,您可以尝试选择 Google NoSQL 数据库之一,例如FirestoreDatastore(实际上是 Datastore 模式下的 Firestore)和Firebase Realtime database。它们都具有适用于多种语言的良好 API,当然包括node.js. 此外,如果您计划创建更大的解决方案,如果您需要分析,甚至可以使用BigTable处理海量数据和BigQuery 。所有这些都取决于您需要什么。

上面提到的 Google API 的优点和非常方便的是,在 Cloud Functions 中,无需对特定产品进行身份验证,从而节省了大量代码和资源。所有的解决方案都是无服务器的,所以当你的解决方案增长时,你不必关心下面的服务器和扩展。当您在 GCP 中执行此操作时,您还可以在资源之间获得极快的网络速度。


推荐阅读