python - Scrapy - pymongo 没有将项目插入数据库
问题描述
所以我玩scrapy试图学习,并使用MongoDB作为我的数据库我走到了死胡同。基本上,当我获取的项目显示在终端日志中时,抓取工作,但我无法获取要在我的数据库上发布的数据。MONGO_URI 是正确的,因为我在 python shell 中尝试过它,我可以在其中创建和存储数据..
这是我的文件
项目.py
import scrapy
class MaterialsItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
price = scrapy.Field()
## url = scrapy.Field()
pass
蜘蛛.py
import scrapy
from scrapy.selector import Selector
from ..items import MaterialsItem
class mySpider(scrapy.Spider):
name = "<placeholder for post>"
allowed_domains = ["..."]
start_urls = [
...
]
def parse(self, response):
products = Selector(response).xpath('//div[@class="content"]')
for product in products:
item = MaterialsItem()
item['title'] = product.xpath("//a[@class='product-card__title product-card__title-v2']/text()").extract(),
item['price'] = product.xpath("//div[@class='product-card__price-value ']/text()").extract()
## product['url'] =
yield item
设置.py
MONGO_PIPELINES = {
'materials.pipelines.MongoPipeline': 300,
}
#setup mongo DB
MONGO_URI = "my MongoDB Atlas address"
MONGO_DB = "materials"
管道.py
import pymongo
class MongoPipeline(object):
collection_name = 'my-prices'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
## pull in information from settings.py
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DB', ', <placeholder-spider name>')
)
def open_spider(self, spider):
## initializing spider
## opening db connection
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
## clean up when spider is closed
self.client.close()
def process_item(self, item, spider):
## how to handle each post
self.db[self.collection_name].insert(dict(item))
logging.debug("Post added to MongoDB")
return item
任何帮助都会很棒!
**编辑
文件结构
materials
spiders
my-spider
items.py
pipelines.py
settings.py
解决方案
MongoPipeline 类中的行不应该是:
collection_name = 'my-prices'
是:
self.collection_name = 'my-prices'
既然你打电话:
self.db[self.collection_name].insert(dict(item))
推荐阅读
- woocommerce - woocommerce:根据订购的产品自动为订单分配不同的自定义状态
- centos - 使用 libvirt_volume.source 的 URL 时如何指定 HTTP 身份验证(用户、密码)
- android - 当我想模拟数据并测试 UI 片段时,doNothing() 不起作用
- laravel - 将 Laraval 与 Vue js 集成
- z3 - 将 Z3 求解器连接到 Key 2.8.0 时出错。在命令行中
- python - 如何将自定义数据聚合从 R 复制到 python 中?
- notifications - K2 (Joomla) 上的项目过期电子邮件通知
- activemq - ActiveMQ 集群设置错误
- ajax - 尝试使用 AJAX 在 Laraver 中进行单个文件上传时数组到字符串的转换错误
- c# - 将图像绑定到 CardViewBinaryImageColumn