parsing - 如何通过lxml检索标签标签内的文本？

问题描述

我正在使用 lxml 来获取标签内的文本，并以这种方式进行

  xpaths_for_questions_lxml = []
    for tag in self.tree.iter():
        try:
            if tag.text and utils.is_question(tag.text.strip()):
                xpaths_for_questions_lxml.append(self.tree.getpath(tag))

        except Exception as e:
            self.logger.debug(traceback.format_exc())
            raise Exception

如果语句有问号，is_question 模块返回 true

但是当标签类型为标签时，tag.text 属性为空，即使实际网页中的标签标签内有文本，它也不会显示任何文本。

为什么标签标签不显示任何文本内容？或者需要做任何额外的事情来通过标签标签获取？

EDIT1：我的问题是，我正在遍历 dom 树中的所有孩子，但为什么标签内的文本没有显示？

标签： parsingweb-scrapinglxmllxml.html

如果你想得到问题，你可以试试

r = requests.get('https://www.amctheatres.com/faqs/movie-info')
source = html.fromstring(r.text)
questions = source.xpath('//label[@itemprop="text"]/text()')

或者

questions = [label.text_content() for label in source.xpath('//label[@itemprop="text"]')]

注意label.text_content()应该使用而不是label.text因为label节点包含多个子文本节点

print(questions)
#['Does the runtime shown for each movie include trailers?', 'Where can I find MPAA movie ratings information?', 'What does advertised showtime mean?', 'What movies are playing right now at AMC?', 'What movies are coming soon to AMC?', 'How can I find movie times at AMC?']

parsing - 如何通过lxml检索标签标签内的文本？

问题描述

解决方案

推荐阅读