首页 > 解决方案 > 无法通过 Puppeteer 获取位于框架内的表格的值

问题描述

以下是我尝试应用 'table[0].$$eval' 方法时遇到的错误(请参阅下面的代码片段):

Failed to execute 'querySelectorAll' on 'Element': '# 297d0e3 > table > tbody > tr:nth-child(1)' is not a valid selector

const puppeteer = require('puppeteer')


const scrape = async () => {
    const browser = await puppeteer.launch({headless:false,defaultViewport: null,args: [
        '--disable-web-security',
        '--disable-features=IsolateOrigins,site-per-process'
      ]});

    const page = await browser.newPage();

    await page.goto('https://dealers.carwow.co.uk/dealers/sign_in')

    await page.type('#dealer_email', 'email')
    await page.type('#dealer_password', 'password')
    await page.click('#new_dealer > p > input')
    
    await new Promise(resolve => setTimeout(resolve, 5000));
    let xpathArray = await page.$x('//*[@id="dealer-dashboard"]/div[3]/div/div/a')
    await xpathArray[0].click() 
    
    await new Promise(resolve => setTimeout(resolve, 5000));
    const frameHandle = await page.$x('//*[@id="klipfolio-iframe"]');

    await new Promise(resolve => setTimeout(resolve, 5000));


    const frame = await frameHandle[0].contentFrame();

    await frame.waitForXPath('//*[@id="0297d0e3"]/table');
    
    const table = await frame.$x('//*[@id="0297d0e3"]/table');

  console.log(table)

    

    browser.close()
};

上面的函数返回一个包含 ElementHandle(下)而不是元素的数组。

[
  ElementHandle {
    _disposed: false,
    _context: ExecutionContext {
      _client: [CDPSession],
      _world: [DOMWorld],
      _contextId: 17,
      _contextName: ''
    },
    _client: CDPSession {
      eventsMap: [Map],
      emitter: [Object],
      _callbacks: Map(0) {},
      _connection: [Connection],
      _targetType: 'page',
      _sessionId: '326BCCF50B6BBE8CA175CB21AB46C382'
    },
    _remoteObject: {
      type: 'object',
      subtype: 'node',
      className: 'HTMLTableElement',
      description: 'table.layout-grid',
      objectId: '3652992625290954585.17.4'
    },
    _page: Page {
      eventsMap: Map(0) {},
      emitter: [Object],
      _closed: false,
      _timeoutSettings: [TimeoutSettings],
      _pageBindings: Map(0) {},
      _javascriptEnabled: true,
      _workers: Map(0) {},
      _fileChooserInterceptors: Set(0) {},
      _userDragInterceptionEnabled: false,
      _client: [CDPSession],
      _target: [Target],
      _keyboard: [Keyboard],
      _mouse: [Mouse],
      _touchscreen: [Touchscreen],
      _accessibility: [Accessibility],
      _frameManager: [FrameManager],
      _emulationManager: [EmulationManager],
      _tracing: [Tracing],
      _coverage: [Coverage],
      _screenshotTaskQueue: [ScreenshotTaskQueue],
      _viewport: null
    },
    _frameManager: FrameManager {
      eventsMap: [Map],
      emitter: [Object],
      _frames: [Map],
      _contextIdToContext: [Map],
      _isolatedWorlds: [Set],
      _client: [CDPSession],
      _page: [Page],
      _networkManager: [NetworkManager],
      _timeoutSettings: [TimeoutSettings],
      _mainFrame: [Frame]
    }
  }
]

我尝试遍历数组,然后应用该方法(见下文)从表中提取数据(见图)

究竟什么是元素句柄,我该如何解决这个问题?

table[0].$$eval('#\30 297d0e3 > table > tbody > tr:nth-child(1)', rows => {
        return Array.from(rows, row => {
         const columns = row.querySelectorAll('td');
         return Array.from(columns, column => column.innerText);
        });
      });

元素树

标签: node.jsweb-scrapingpuppeteer

解决方案


如果您需要一个 id 中带有数字的选择器,请尝试以下解决方法:

table[0].$$eval('[id="the_number"] > table > tbody > tr:nth-child(1)', rows => {

推荐阅读