首页 > 解决方案 > 如何使用cheerio仅从html链接中获取文本

问题描述

您好,我有一个网页,其中包含这样的 HTML

<div class="css-content">
   <div class="css-2aj">
      <img src="" >
      <div data-bn-type="text" id="/48" class="">Latest News</div>
   </div>
   <div class="css-6f9">
      <div class="css-content">
         <a data-bn-type="link" href="/en/blog/news/523hshhshhshhs3331adc0" class="css-1ej">US could be on cusp of new Covid surge</a>

         <a data-bn-type="link" href="/en/blog/news/423hshhshhshhs3331adc0" class="css-1ej">Stop sharing your vaccine cards on social media</>
            <a data-bn-type="link" href="/en/blog/news/2222hshhshhshhs3331adc0" class="css-1ej">Italians can be fined up to $60,000 for selling the world's 'most dangerous' cheese</a>

         <a data-bn-type="link" href="/en/blog/news/2223hshhshhshhs3331adc0" class="css-1ej">The Masked Singer' reveals the identity of The Phoenix<a/>

        
      </div>
   </div>
</div>

我想要这样的结果

这是我尝试过的

    var list = [];
$('div[class="css-6f9"]').find('div  > a').each(function (index, element) {
    list.push($(element).attr('href'));
});


console.log(list);

结果是空数组

我在这里是全新的,不知道如何获取<a></a>标签中的结果请帮助

标签: node.jscheerio

解决方案


尝试这个

不需要cheerio作为$

const html = `<div class="css-content">
<div class="css-2aj">
   <img src="" >
   <div data-bn-type="text" id="/48" class="">Latest News</div>
</div>
<div class="css-6f9">
   <div class="css-content">
      <a data-bn-type="link" href="/en/blog/news/523hshhshhshhs3331adc0" class="css-1ej">US could be on cusp of new Covid surge</a>

      <a data-bn-type="link" href="/en/blog/news/423hshhshhshhs3331adc0" class="css-1ej">Stop sharing your vaccine cards on social media</>
         <a data-bn-type="link" href="/en/blog/news/2222hshhshhshhs3331adc0" class="css-1ej">Italians can be fined up to $60,000 for selling the world's 'most dangerous' cheese</a>

      <a data-bn-type="link" href="/en/blog/news/2223hshhshhshhs3331adc0" class="css-1ej">The Masked Singer' reveals the identity of The Phoenix<a/>

     
   </div>
</div>
</div>`;
const cheerio = require('cheerio');
const $ = cheerio.load(html);
let list = [];
$('.css-content > a').each(function () {
  list.push($(this).text().trim());
});
console.log(list.filter((item) => Boolean(item)));

在此处输入图像描述


推荐阅读