javascript - 从中检索数据
- 在木偶戏
问题描述
我想使用 puppeteer 从 HTML ul 标记获取内部文本,这是我用来创建内部文本数组但出现错误的方法。
const li = document.querySelector('#year-list-container > div > div.js-profile-timeline-year-list.color-bg-primary.js-sticky > ul').getElementsByTagName('li')
array = []
for (let i = 0; i <= li.length - 1; i++) {
array.push(li[i]);
}
错误是这个
(Use `node --trace-warnings ...` to show where the warning was created)
(node:15860) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:15860) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the
Node.js process with a non-zero exit code.
完整的代码,我已经导入了 puppeteer,所以这不是错误
const browser = await puppeteer.launch();
const page = await browser.newPage();
giturl = ('https://github.com/siddhart1o1');
await page.goto(giturl , {waitUntil: 'networkidle2'})
let data = await page.evaluate(()=>{
let stars = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-3.mb-4.mb-md-0 > div > div.js-profile-editable-replace > div.d-flex.flex-column > div.js-profile-editable-area.d-flex.flex-column.d-md-block > div.flex-order-1.flex-md-order-none.mt-2.mt-md-0 > div > a:nth-child(3) > span').innerText
let followers = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-3.mb-4.mb-md-0 > div > div.js-profile-editable-replace > div.d-flex.flex-column > div.js-profile-editable-area.d-flex.flex-column.d-md-block > div.flex-order-1.flex-md-order-none.mt-2.mt-md-0 > div > a:nth-child(1) > span').innerText
let following = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-3.mb-4.mb-md-0 > div > div.js-profile-editable-replace > div.d-flex.flex-column > div.js-profile-editable-area.d-flex.flex-column.d-md-block > div.flex-order-1.flex-md-order-none.mt-2.mt-md-0 > div > a:nth-child(2) > span').innerText
let repos = document.querySelector('#js-pjax-container > div.container-xl.px-3.px-md-4.px-lg-5 > div > div.flex-shrink-0.col-12.col-md-9.mb-4.mb-md-0 > div.UnderlineNav.user-profile-nav.d-block.d-md-none.position-sticky.top-0.pl-3.ml-n3.mr-n3.pr-3.color-bg-primary > nav > a:nth-child(2) > span').innerText
//this code is giving error
let li = document.querySelector('div.js-profile-timeline-year-list.color-bg-primary.js-sticky > ul').getElementsByTagName('li')
array = []
for (let i = 0; i <= li.length - 1; i++) {
array.push(li[i]);
}
return{
stars,
followers,
followinf,
repos,
array
}
})
console.log(data)
await browser.close();
})();
解决方案
- 你这里有一个错字:
return{
stars,
followers,
followinf, // Should be following
repos,
array
}
- 不幸的是,
page.evaluate()
只能传输可序列化的值(大致是 JSON 可以处理的值)。由于getElementsByTagName()
返回不可序列化的 DOM 元素集合(它们包含方法和循环引用),集合中的每个元素都被替换为一个空对象。您需要返回可序列化的值(例如,文本或href
属性数组)或使用类似API。所以试试这个:page.$$(selector)
ElementHandle
for (let i = 0; i <= li.length - 1; i++) {
array.push(li[i].innerText);
}
推荐阅读
- keras - 使用自定义 Word2Vec 嵌入而不是 GloVe
- javascript - php pdo 将数组动态插入数据库
- flutter - Flutter obscureText 给出点而不是密码
- magento2 - 相关产品导入在 magento 2 中每次只完成 100 个产品
- pandas - 熊猫提取分层信息?
- python - SQLAlchemy,过滤掉不符合条件的行,但如果不符合条件,则使用该行的相邻列
- excel - 将字符串分成两部分 - VBA
- iis - .Net Core 3.1 API - 到 IIS 的基本部署
- javascript - cypress 中的音频播放
- angular - 如何在 Angular 中执行异步验证并将参数发送到端点?