首页 > 解决方案 > 应该定义的变量给了我 NameErrors,我该如何解决?

问题描述

我正在尝试将值附加到数据以转储到 json 文件中,但我不断收到此错误:

Traceback (most recent call last):
  File "C:\Users\techn\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "C:\Users\techn\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output
    for x in result:
  File "C:\Users\techn\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "C:\Users\techn\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\techn\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\techn\scrapy\KYCSpider\KYCSpider\spiders\kycspider.py", line 92, in parse
    data['Government Members'].append({
NameError: name 'data' is not defined

这个问题早些时候发生在另一个变量上,它不会说它已经定义,尽管它是在函数之外定义的。我不知道我在这里做错了什么。

class KYCSpider(scrapy.Spider):
    name = 'kycspider'
    start_urls = [
        'http://www.vlada.si/en/about_the_government/members_of_government/'
        ]
    allowed_domains = ['www.vlada.si']
    maxdepth = 1
    isNewDoc = False
    oldData = ''
    newFile = ''
    data = {}
    data['Government Members'] = []

    def spider_opened(self):
        print("OPENED SPIDER")
        global newFile, oldData, isNewDoc
        #If data.json exists, copy its data into a string and trunctuate it
        try:
            oldFile = open('data.json', 'r')
            oldData = oldFile.read()
            isNewDoc = False
        #If data.json file doesn't exist, tell spider that this is a new doc
        except FileNotFoundError:
            isNewDoc = True
        newFile = open('data.json', 'w')
        newFile.write("[")


    def parse(self, response):
        global data, isNewDoc
        #code that assigns values to from_name, from_designation, etc.

        data['Government Members'].append({
                    'name': from_name, 
                    'designation': from_designation,
                    'dob': dob,
                    'address': address,
                    'email': email,
                    'phone': phone,
                    'website': website,
                    'sourceURL': sourceURL,
                    'operation': operation
            })

我希望数据可以附加任何抓取的信息,因此我可以在完成爬网后将其转储到 JSON 文件中。

标签: python-3.xscrapy

解决方案


它不是global,它只是全班,所以它在self。即在该方法中 使用的self.data任何地方。例子:data

# remove the global statment
self.data['Government Members'].append(...)

尽管您应该使用构造函数,而不是仅仅在开放中定义变量,例如:

def __init__(self):
    self.data = {'Government Members': []}
def parse(self):
    print(self.data)

推荐阅读