首页 > 解决方案 > 在scrapy中写一个蜘蛛,但是为什么'yield item'不能在嵌套的for循环中工作?

问题描述

我有一个用scrapy写的蜘蛛,但是没有在for循环中执行产生的项目,请参见下面的代码。

def parse_paragraph(self, div_list, category_name, group_name):
    for div in div_list:
        duilian_text_list = div.xpath('./text()').extract()
        duilian_text_list = strip_list(duilian_text_list)
        if len(duilian_text_list) == 0:
            continue
        elif len(duilian_text_list) == 1:
            duilian_text = duilian_text_list[0]
            self.parse_duilian(duilian_text, category_name, group_name)
        elif len(duilian_text_list) == 2 and not is_single_line(duilian_text_list[0]):
            duilian_text = ''.join(duilian_text_list)
            self.parse_duilian(duilian_text, category_name, group_name)
        else:
            for duilian_text in duilian_text_list:
                duilian_item = DuilianItem()
                duilian_item['id'] = str(uuid.uuid4()).replace('-', '')
                duilian_item['category_id'] = getCategoryName(category_name)
                duilian_item['group_name'] = group_name
                duilian = parse_duilian(duilian_text)
                if duilian != '|':
                    duilian_item['name'] = duilian
                    duilian_item['desc'] = ''
                    duilian_item['author'] = ''
                    duilian_item['shuti'] = ''
                    duilian_item['word_count'] = len(duilian_item['name']) // 2
                    duilian_item['image_url'] = ''
                    print('-------I am here--------')
                    yield duilian_item

当我调用这个函数时,我在输出窗口中什么也没有,似乎该行yiled duilian_item不起作用,甚至阻止其他代码执行(它上面的打印行)。

当我注释掉最后一行yiled duilian_item时,一切正常,我进入-------I am here--------了输出窗口,这里有什么问题?

简单地说,下面的代码什么都不打印,但是如果我注释掉yiled 1,它会打印列表,所以 python 中的 yield 不能在 for 循环中工作?

def strange_yield():
    list = [1, 2, 3]
    for i in list:
        print(i)
        yield 1

strange_yield()

标签: pythonscrapyyield

解决方案


当您在 python 函数中使用 yield 时,该函数将成为生成器函数。按照您的功能处理它的正确方法strange_yield是:

my_yield = strange_yield()

my_yield 现在是 Generator Function 的一个实例strange_yield。生成器函数可以迭代,也可以使用以下next()函数提取下一个值:

print(next(my_yield))

或者

for yield_value in my_yield:
  print(yield_value)

推荐阅读