首页 > 解决方案 > ImportError:使用scrapy时没有名为counsor.settings的模块

问题描述

我的爬虫结构如下:

├── README.md
├── counselor
│   ├── filter_words.py
│   ├── items.py
│   ├── langconv.py
│   ├── main.py
│   ├── pipelines.py
│   ├── queue.py
│   ├── settings.py
│   ├── spiders
│   │   ├── __init__.py
│   │   └── wiki.py
│   └── zh_wiki.py
└── scrapy.cfg

我的 main.py 如下:

from scrapy import cmdline
cmdline.execute('scrapy crawl wikipieda_spider'.split())

我的辅导员/spiders/wiki.py 如下:

class WiKiSpider(scrapy.Spider):
    urlQueue = Queue()
    name = 'wikipieda_spider'
    allowed_domains = ['zh.wikipedia.org']
    start_urls = ['https://zh.wikipedia.org/wiki/Category:%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BC%96%E7%A8%8B']
    custom_settings = {
        'ITEM_PIPELINES': {'counselor.pipelines.WikiPipeline': 800}
    }
    ......

我的辅导员/settings.py:

BOT_NAME = 'counselor'

SPIDER_MODULES = ['counselor.spiders']
NEWSPIDER_MODULE = 'counselor.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'counselor.pipelines.WikiPipeline': 800,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True

在项目根目录中,我有scrapy.cfg:

[settings]
default = counselor.settings

[deploy]
#url = http://localhost:6800/
project = counselor

现在我转到我的项目根目录(与 scrapy.cfg 相同的目录)并执行:

python counselor/main.py 
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/OpenSSL/crypto.py:14: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
  from cryptography import utils, x509
Traceback (most recent call last):
  File "counselor/main.py", line 2, in <module>
    cmdline.execute('scrapy crawl wikipieda_spider'.split())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/cmdline.py", line 114, in execute
    settings = get_project_settings()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
    settings.setmodule(settings_module_path, priority='project')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
    module = import_module(module)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named counselor.settings

我的代码不直接导入counselor.settings。为什么会出现这个错误?

标签: pythonscrapyweb-crawler

解决方案


因为scrapy 确实会根据您配置中的项目名称导入它。您需要做的就是通过添加一个__init__.py. 它不需要任何内容​​;#为方便起见,您可以添加一行。


推荐阅读