python - ImportError:使用scrapy时没有名为counsor.settings的模块
问题描述
我的爬虫结构如下:
├── README.md
├── counselor
│ ├── filter_words.py
│ ├── items.py
│ ├── langconv.py
│ ├── main.py
│ ├── pipelines.py
│ ├── queue.py
│ ├── settings.py
│ ├── spiders
│ │ ├── __init__.py
│ │ └── wiki.py
│ └── zh_wiki.py
└── scrapy.cfg
我的 main.py 如下:
from scrapy import cmdline
cmdline.execute('scrapy crawl wikipieda_spider'.split())
我的辅导员/spiders/wiki.py 如下:
class WiKiSpider(scrapy.Spider):
urlQueue = Queue()
name = 'wikipieda_spider'
allowed_domains = ['zh.wikipedia.org']
start_urls = ['https://zh.wikipedia.org/wiki/Category:%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BC%96%E7%A8%8B']
custom_settings = {
'ITEM_PIPELINES': {'counselor.pipelines.WikiPipeline': 800}
}
......
我的辅导员/settings.py:
BOT_NAME = 'counselor'
SPIDER_MODULES = ['counselor.spiders']
NEWSPIDER_MODULE = 'counselor.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'counselor.pipelines.WikiPipeline': 800,
}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
在项目根目录中,我有scrapy.cfg:
[settings]
default = counselor.settings
[deploy]
#url = http://localhost:6800/
project = counselor
现在我转到我的项目根目录(与 scrapy.cfg 相同的目录)并执行:
python counselor/main.py
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/OpenSSL/crypto.py:14: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
from cryptography import utils, x509
Traceback (most recent call last):
File "counselor/main.py", line 2, in <module>
cmdline.execute('scrapy crawl wikipieda_spider'.split())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/cmdline.py", line 114, in execute
settings = get_project_settings()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
module = import_module(module)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named counselor.settings
我的代码不直接导入counselor.settings。为什么会出现这个错误?
解决方案
因为scrapy 确实会根据您配置中的项目名称导入它。您需要做的就是通过添加一个__init__.py
. 它不需要任何内容;#
为方便起见,您可以添加一行。
推荐阅读
- java - 尝试用 LWJGL3 绘制基本形状
- scala - 在 oozie 中运行 spark 操作时出现 DiskSpace 配额异常
- javascript - Python Selenium BeautifulSoup 页面源不显示所有内容
- python - 从文本文件中提取两个分隔符之间的文本
- php - 从 SQL 将多个 URL 变量添加到 php 查询字符串
- r - R中光栅(netCDF)计算的速度显着不同
- nginx - 创建到子目录的位置路由
- apache - 413 请求实体太大 apache“tomcat”
- sql - 微软 SQL 服务器 2012; 我有以下语法错误:'关键字'BEGIN'附近的语法不正确。'
- javascript - 如何使用javascript在Acrobat中将一系列两个字段合并为一系列一个字段