首页 > 解决方案 > 为什么 .text() 和 .html() 不能与cheerio js 和 node-fetch 一起使用?

问题描述

我是 Node JS 的新手,使用 node-fetch 和 Cheerio 包。我正在尝试从不同的网站抓取数据,因此我通过传递许多不同的 URL 和选择器进行测试。但是,在下面的代码中,无论我作为输入传递什么选择器或 URL,.text() 返回一个空字符串,而 .html() 返回 null。

const cheerio= require('cheerio');
const fetch = require('node-fetch');

fetch('https://www.npmjs.com/package/node-fetch/')
    .then((res)=>{ 
        if(res.ok){       
            let $=cheerio.load(res);
            console.log(res);
            let siteData = $('#readme > p:nth-child(8)');
            console.log(siteData.text());
            console.log(siteData.html());
            return res.text();
        }else{
            throw new Error(res.statusText);
        }
    }) 
    .then(body => console.log(body))       
    .catch(error => console.log(error))

我什至将 res.text() 的输出写入一个文件,并将其与网站的源 HTML 进行比较。几乎是一样的。res的值打印如下:

Response {
  size: 0,
  timeout: 0,
  prev: null,
  next: null,
  root: {
    type: 'root',
    name: 'root',
    namespace: 'http://www.w3.org/1999/xhtml',
    attribs: [Object: null prototype] {},
    'x-attribsNamespace': [Object: null prototype] {},
    'x-attribsPrefix': [Object: null prototype] {},
    children: [ [Circular] ],
    parent: null,
    prev: null,
    next: null
  },
  parent: null,
  [Symbol(Body internals)]: {
    body: Gunzip {
      _writeState: [Uint32Array],
      _readableState: [ReadableState],
      readable: true,
      _events: [Object: null prototype],
      _eventsCount: 6,
      _maxListeners: undefined,
      _writableState: [WritableState],
      writable: true,
      allowHalfOpen: true,
      _transformState: [Object],
      _hadError: false,
      bytesWritten: 0,
      _handle: [Zlib],
      _outBuffer: <Buffer 80 00 f4 9f 03 02 00 00 f0 80 f2 9f 03 02 00 00 20 46 00 00 00 00 00 00 d8 73 dd 9f 03 02 00 00 0f 00 00 00 7f ae f8 39 01 5d dd 9f 03 02 00 00 d0 68 ... 16334 more bytes>,
      _outOffset: 0,
      _chunkSize: 16384,
      _defaultFlushFlag: 2,
      _finishFlushFlag: 2,
      _defaultFullFlushFlag: 3,
      _info: undefined,
      _level: -1,
      _strategy: 0,
      [Symbol(kCapture)]: false
    },
    disturbed: false,
    error: null
  },
  [Symbol(Response internals)]: {
    url: 'https://www.npmjs.com/package/node-fetch',
    status: 200,
    statusText: 'OK',
    headers: Headers { [Symbol(map)]: [Object: null prototype] },
    counter: 1
  }
}

甚至 siteData 的对象也打印如下:

initialize {
  options: {
    withDomLvl1: true,
    normalizeWhitespace: false,
    xml: false,
    decodeEntities: true
  },
  _root: initialize {
    '0': {
      type: 'root',
      name: 'root',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype] {},
      'x-attribsNamespace': [Object: null prototype] {},
      'x-attribsPrefix': [Object: null prototype] {},
      children: [Array],
      parent: null,
      prev: null,
      next: null
    },
    options: {
      withDomLvl1: true,
      normalizeWhitespace: false,
      xml: false,
      decodeEntities: true
    },
    length: 1,
    _root: [Circular]
  },
  length: 0,
  prevObject: initialize {
    '0': {
      type: 'root',
      name: 'root',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype] {},
      'x-attribsNamespace': [Object: null prototype] {},
      'x-attribsPrefix': [Object: null prototype] {},
      children: [Array],
      parent: null,
      prev: null,
      next: null
    },
    options: {
      withDomLvl1: true,
      normalizeWhitespace: false,
      xml: false,
      decodeEntities: true
    },
    length: 1,
    _root: [Circular]
  }
}

尽管如此, siteData.text() 是一个空字符串,并且 siteData.html() 是空的。请告知此代码有什么问题,我已经浏览了许多堆栈溢出页面,并阅读了 Cheerio 文档,但我仍然没有找到答案。

提前谢谢你。

标签: javascriptnode.jscheerionode-fetch

解决方案


推荐阅读