首页 > 解决方案 > 从中检索数据

  • 一个
  • 问题描述

    我想使用 puppeteer 从 HTML ul 标记获取内部文本,这是我用来创建内部文本数组但出现错误的方法。

      const li =  document.querySelector('#year-list-container > div > div.js-profile-timeline-year-list.color-bg-primary.js-sticky > ul').getElementsByTagName('li')
      array = []
      for (let i = 0; i <= li.length - 1; i++) {
        array.push(li[i]);
      }
    

    错误是这个

    (Use `node --trace-warnings ...` to show where the warning was created)
    (node:15860) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
    (node:15860) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the 
    Node.js process with a non-zero exit code.
    

    完整的代码,我已经导入了 puppeteer,所以这不是错误

      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      giturl  = ('https://github.com/siddhart1o1');
        await page.goto(giturl , {waitUntil: 'networkidle2'})
    
        let data =  await page.evaluate(()=>{
            let stars  = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-3.mb-4.mb-md-0 > div > div.js-profile-editable-replace > div.d-flex.flex-column > div.js-profile-editable-area.d-flex.flex-column.d-md-block > div.flex-order-1.flex-md-order-none.mt-2.mt-md-0 > div > a:nth-child(3) > span').innerText
            let followers = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-3.mb-4.mb-md-0 > div > div.js-profile-editable-replace > div.d-flex.flex-column > div.js-profile-editable-area.d-flex.flex-column.d-md-block > div.flex-order-1.flex-md-order-none.mt-2.mt-md-0 > div > a:nth-child(1) > span').innerText
            let following = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-3.mb-4.mb-md-0 > div > div.js-profile-editable-replace > div.d-flex.flex-column > div.js-profile-editable-area.d-flex.flex-column.d-md-block > div.flex-order-1.flex-md-order-none.mt-2.mt-md-0 > div > a:nth-child(2) > span').innerText
            let repos = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-9.mb-4.mb-md-0 > div.UnderlineNav.user-profile-nav.d-block.d-md-none.position-sticky.top-0.pl-3.ml-n3.mr-n3.pr-3.color-bg-primary > nav > a:nth-child(2) > span').innerText
            //this code is giving error
            let li = document.querySelector('div.js-profile-timeline-year-list.color-bg-primary.js-sticky > ul').getElementsByTagName('li')
            array = []
            for (let i = 0; i <= li.length - 1; i++) {
              array.push(li[i]);
            }
    
            return{
                stars,
                followers,
                followinf,
                repos,
                array
            }
    
    
        })
    
        console.log(data)
      await browser.close();
    })();
    
    

    标签: javascripthtmlcsspuppeteer

    解决方案


    1. 你这里有一个错字:
            return{
                stars,
                followers,
                followinf, // Should be following 
                repos,
                array
            }
    
    1. 不幸的是,page.evaluate()只能传输可序列化的值(大致是 JSON 可以处理的值)。由于getElementsByTagName()返回不可序列化的 DOM 元素集合(它们包含方法和循环引用),集合中的每个元素都被替换为一个空对象。您需要返回可序列化的值(例如,文本或href属性数组)或使用类似API。所以试试这个:page.$$(selector)ElementHandle
      for (let i = 0; i <= li.length - 1; i++) {
        array.push(li[i].innerText);
      }
    

    推荐阅读