首页 > 解决方案 > 如何将抓取的数据从 Scrapy 存储到 FTP 为 csv?

问题描述

我的scrapy settings.py

from datetime import datetime
file_name = datetime.today().strftime('%Y-%m-%d_%H%M_')
save_name = file_name + 'Mobile_Nshopping'
FEED_URI = 'ftp://myusername:mypassword@ftp.mymail.com/uploads/%(save_name)s.csv'

当我运行我的蜘蛛爬虫时,我的项目名称出现错误......我可以创建一个管道吗?

\scrapy\extensions\feedexport.py:247: ScrapyDeprecationWarning: The `FEED_URI` and `FEED_FORMAT` settings have been deprecated in favor of the `FEEDS` setting. Please see the `FEEDS` setting docs for more details
 exporter = cls(crawler)
Traceback (most recent call last):
 File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
   exec(code, run_globals)
 File "C:\Users\viren\AppData\Local\Programs\Python\Python38\Scripts\scrapy.exe\__main__.py", line 7, in <module>
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 145, in execute
   _run_print_help(parser, _run_command, cmd, args, opts)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
   func(*a, **kw)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
   cmd.run(args, opts)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\commands\crawl.py", line 22, in run
   crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 191, in crawl
   crawler = self.create_crawler(crawler_or_spidercls)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 224, in create_crawler
   return self._create_crawler(crawler_or_spidercls)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 229, in _create_crawler
   return Crawler(spidercls, self.settings)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 72, in __init__
   self.extensions = ExtensionManager.from_crawler(self)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
   return cls.from_settings(crawler.settings, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
   mw = create_instance(mwcls, settings, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
   instance = objcls.from_crawler(crawler, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 247, in from_crawler
   exporter = cls(crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 282, in __init__
   if not self._storage_supported(uri, feed_options):
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 427, in _storage_supported
   self._get_storage(uri, feed_options)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 458, in _get_storage
   instance = build_instance(feedcls.from_crawler, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 455, in build_instance
   return build_storage(builder, uri, feed_options=feed_options, preargs=preargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
   return builder(*preargs, uri, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 201, in from_crawler
   return build_storage(
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
   return builder(*preargs, uri, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 192, in __init__
   self.port = int(u.port or '21')
 File "c:\users\viren\appdata\local\programs\python\python38\lib\urllib\parse.py", line 174, in port
raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'Edh=)9sd'

我不知道如何将 CSV 存储到 FTP 中。 错误来了,因为我的密码是 int?有什么我忘记写的吗?

标签: scrapyftpscrapy-pipeline

解决方案


我可以创建一个管道吗?

是的,您可能应该创建一个管道。如Scrapy 架构图所示,基本概念是这样的:发送请求,返回响应并由蜘蛛处理,最后,管道对蜘蛛返回的项目进行处理。在您的情况下,您可以创建一个管道,将数据保存在 CSV 文件中并将其上传到 ftp 服务器。有关更多信息,请参阅Scrapy 的 Item Pipeline 文档

我不知道如何将 CSV 存储到 FTP 中。错误来了,因为我的密码是 int?有什么我忘记写的吗?

我相信这是由于下面的弃用错误(并显示在您提供的错误的顶部) ScrapyDeprecationWarning: The FEED_URI and FEED_FORMAT settings have been deprecated in favor of the FEEDS setting. Please see the FEEDS setting docs for more details:。

尝试替换FEED_URIFEEDS; 请参阅Scrapy 文档FEEDS


推荐阅读