首页 > 解决方案 > 我无法使用 scrapy.FormRequest 发出“POST”请求

问题描述

class HPUXSpider(_BaseSpider):
    name = 'hp_ux_spider'

    def start_requests(self):
        return [scrapy.FormRequest(
            url='https://platform.cloud.coveo.com/rest/search/v2?count=3',
            method='POST',
            formdata=my_data
            callback=self.save_response,
            cb_kwargs=dict(path_dir=DATA_DIR, file_name='win-1.json')
        ) ]

在重新计时的“my_data”的位置,我插入了我在代码查看模式下从浏览器获取的代码。此代码显示在图像中

在此处输入图像描述

2020-07-13 15:50:51 [扭曲] 关键:延迟中未处理的错误:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 88, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/code/hp_ux/splash/spiders/hp_ux_spider.py", line 102, in start_requests
    cb_kwargs=dict(path_dir=DATA_DIR, file_name='win-1.json')
  File "/usr/local/lib/python3.7/site-packages/scrapy/http/request/form.py", line 31, in __init__
    querystr = _urlencode(items, self.encoding)
  File "/usr/local/lib/python3.7/site-packages/scrapy/http/request/form.py", line 72, in _urlencode
    for k, vs in seq
  File "/usr/local/lib/python3.7/site-packages/scrapy/http/request/form.py", line 72, in <listcomp>
    for k, vs in seq
builtins.ValueError: not enough values to unpack (expected 2, got 1)

2020-07-13 15:50:51 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 88, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/code/hp_ux/splash/spiders/hp_ux_spider.py", line 102, in start_requests
    cb_kwargs=dict(path_dir=DATA_DIR, file_name='win-1.json')
  File "/usr/local/lib/python3.7/site-packages/scrapy/http/request/form.py", line 31, in __init__
    querystr = _urlencode(items, self.encoding)
  File "/usr/local/lib/python3.7/site-packages/scrapy/http/request/form.py", line 72, in _urlencode
    for k, vs in seq
  File "/usr/local/lib/python3.7/site-packages/scrapy/http/request/form.py", line 72, in <listcomp>
    for k, vs in seq
ValueError: not enough values to unpack (expected 2, got 1)

数据:

操作历史=%5B%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-13T12%3A49%3A51.480Z%5C%22%22%7D%2C %7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-13T10%3A44%3A35.303Z%5C%22%22%7D%2C%7B%22name %22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-13T07%3A49%3A10.078Z%5C%22%22%7D%2C%7B%22name%22%3A %22Query%22%2C%22time%22%3A%22%5C%222020-07-13T06%3A58%3A59.532Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22 %2C%22time%22%3A%22%5C%222020-07-13T06%3A57%3A24.599Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time %22%3A%22%5C%222020-07-12T21%3A47%3A41.323Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A %22%5C%222020-07-12T16%3A38%3A19.741Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C %222020-07-12T06%3A04%3A36.049Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07 -12T05%3A59%3A39。814Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-11T19%3A31%3A55.963Z%5C% 22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-11T19%3A29%3A55.997Z%5C%22%22% 7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-11T19%3A23%3A29.999Z%5C%22%22%7D%2C% 7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-11T19%3A21%3A09.859Z%5C%22%22%7D%2C%7B%22name% 22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-11T19%3A19%3A03.748Z%5C%22%22%7D%2C%7B%22name%22%3A% 22Query%22%2C%22time%22%3A%22%5C%222020-07-11T19%3A17%3A23.735Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22% 2C%22time%22%3A%22%5C%222020-07-11T19%3A14%3A51.152Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time% 22%3A%22%5C%222020-07-11T18%3A54%3A03.418Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A% 22%5C%222020-07-11T12%3A28%3A39。484Z%5C%22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-10T13%3A08%3A42.876Z%5C% 22%22%7D%2C%7B%22name%22%3A%22Query%22%2C%22time%22%3A%22%5C%222020-07-10T12%3A57%3A51.285Z%5C%22%22% 7D%5D&referrer=https%3A%2F%2Fsupport.hpe.com%2Fhpesc%2Fpublic%2Fkm%2FSecurity-Bulletin-Library&visitorId=33b0ede7-3274-486f-a31c-23ed3001ad91&isGuestUser=false&aq=(%40kmdoctypedetails%3D%3Dcv6600001)%20001 ((NOT%20%40kmdoctype%3Dcv60000001))%20(%40kmdocsecuritybulletin%3D%3D4000003)%20(%40kmdoc语言代码%3D%3D(cv1871440%2Ccv1871463))&cq=(%40source%3D%3D%22cdp-km- document-pro-h4-v2%22)&searchHub=HPE-SecurityBulletins-Page&locale=ru&firstResult=0&numberOfResults=25&excerptLength=500&enableDidYouMean=true&sortCriteria=relevancy&queryFunctions=%5B%5D&rankingFunctions=%5B%5D&groupBy=%5B%7B%22field%22%3A%22%40kmdocsecuritybulletin%22%2C%22maximumNumberOfValues%22%3A20%2C%22sortCriteria%22%3A%22nosort%22%2C%22injectionDepth%22%3A1000%2C%22completeFacetWithStandardValues %22%3Atrue%2C%22allowedValues%22%3A%5B%224000019%22%2C%224000018%22%2C%224000005%22%2C%224000004%22%2C%224000017%22%2C%224000003%22%2C %224000009%22%2C%224000006%22%2C%224000007%22%2C%224000008%22%2C%224000001%22%2C%224000002%22%2C%224000010%22%2C%224000011%2C%224000011%22%4002% %22%2C%224000013%22%2C%224000014%22%2C%224000015%22%2C%224000016%22%5D%2C%22advancedQueryOverride%22%3A%22(%40kmdoctypedetails%3D%3Dcv66000018)%20(( NOT%20%40kmdoctype%3Dcv60000001))%20(%40kmdoclanguagecode%3D%3D(cv1871440%2Ccv1871463))%22%2C%22constantQueryOverride%22%3A%22(%40source%3D%3D%5C%22cdp-km- document-pro-h4-v2%5C%22)%22%7D%2C%7B%22field%22%3A%22%40kmdoc语言代码%22%2C%22maximumNumberOfValues%22%3A6%2C%22sortCriteria%22%3A%22Score%22%2C%22injectionDepth%22%3A1000%2C%22completeFacetWithStandardValues%22%3Atrue%2C%22allowedValues%22%3A%5B%22cv1871440%22%2C% 22cv1871463%22%5D%2C%22advancedQueryOverride%22%3A%22(%40kmdoctypedetails%3D%3Dcv66000018)%20((NOT%20%40kmdoctype%3Dcv60000001))%20(%40kmdocsecuritybulletin%3D%3D4000003)%22%2 %22constantQueryOverride%22%3A%22(%40source%3D%3D%5C%22cdp-km-document-pro-h4-v2%5C%22)%22%7D%2C%7B%22field%22%3A%22 %40kmdoctopissue%22%2C%22maximumNumberOfValues%22%3A6%2C%22sortCriteria%22%3A%22Score%22%2C%22injectionDepth%22%3A1000%2C%22completeFacetWithStandardValues%22%3Atrue%2C%22allowedValues%22%3A%5B %5D%2C%22advancedQueryOverride%22%3A%22(%40kmdoctypedetails%3D%3Dcv66000018)%20((NOT%20%40kmdoctype%3Dcv60000001))%20(%40kmdocsecuritybulletin%3D%3D4000003)%20(%40kmdoclanguagecode%3D %3D(cv1871440%2Ccv1871463))%22%2C%22constantQueryOverride%22%3A%22(%40source%3D%3D%5C%22cdp-km-document-pro-h4-v2%5C%22)%20%40kmdoctopissueexpirationdate%3Etoday% 22%7D%2C%7B%22field%22%3A%22%40kmdocdisclosurelevel%22%2C%22maximumNumberOfValues%22%3A6%2C%22sortCriteria%22%3A%22Score%22%2C%22injectionDepth%22%3A1000%2C% 22completeFacetWithStandardValues%22%3Atrue%2C%22allowedValues%22%3A%5B%5D%7D%2C%7B%22field%22%3A%22%40hpescuniversaldate%22%2C%22completeFacetWithStandardValues%22%3Atrue%2C%22maximumNumberOfValues%22% 3A1%2C%22sortCriteria%22%3A%22nosort%22%2C%22generateAutomaticRanges%22%3Atrue%2C%22advancedQueryOverride%22%3A%22(%40kmdoctypedetails%3D%3Dcv66000018)%20((NOT%20%40kmdoctype%3Dcv60000001 ))%20(%40kmdocsecuritybulletin%3D%3D4000003)%20(%40kmdoclanguagecode%3D%3D(cv1871440%2Ccv1871463))%20%40uri%22%2C%22constantQueryOverride%22%3A%22(%40source%3D%3D %5C%22cdp-km-document-pro-h4-v2%5C%22)%20%40hpescuniversaldate%3E1970%2F01%2F01%4000%3A00%3A00%22%7D%2C%7B%22field%22%3A%22 %40hpescuniversaldate%22%2C%22completeFacetWithStandardValues%22%3Atrue%2C%22maximumNumberOfValues%22%3A1%2C%22sortCriteria%22%3A%22nosort%22%2C%22generateAutomaticRanges%22%3Atrue%2C%22constantQueryOverride%22%3A%22 (%40source%3D%3D%5C%22cdp-km-document-pro-h4-v2%5C%22)%20%40hpescuniversaldate%3E1970%2F01%2F01%4000%3A00%3A00%20%40hpescuniversaldate%3E1970%2F01 %2F01%4000%3A00%3A00%22%7D%2C%7B%22field%22%3A%22%40hpescuniversaldate%22%2C%22maximumNumberOfValues%22%3A5%2C%22sortCriteria%22%3A%22nosort%22%2C %22injectionDepth%22%3A1000%2C%22completeFacetWithStandardValues%22%3Atrue%2C%22rangeValues%22%3A%5B%7B%22start%22%3A%221900-01-31T18%3A20%3A09.000Z%22%2C%22end %22%3A%222020-07-13T17%3A00%3A00。000Z%22%2C%22label%22%3A%22All%20dates%22%2C%22endInclusive%22%3Afalse%7D%2C%7B%22start%22%3A%222020-07-05T17%3A00%3A00.000Z% 22%2C%22end%22%3A%222020-07-13T17%3A00%3A00.000Z%22%2C%22label%22%3A%22Last%207%20days%22%2C%22endInclusive%22%3Afalse%7D% 2C%7B%22start%22%3A%222020-06-12T17%3A00%3A00.000Z%22%2C%22end%22%3A%222020-07-13T17%3A00%3A00.000Z%22%2C%22label% 22%3A%22Last%2030%20days%22%2C%22endInclusive%22%3Afalse%7D%2C%7B%22start%22%3A%222020-05-13T17%3A00%3A00.000Z%22%2C%22end% 22%3A%222020-07-13T17%3A00%3A00.000Z%22%2C%22label%22%3A%22Last%2060%20days%22%2C%22endInclusive%22%3Afalse%7D%2C%7B%22start% 22%3A%222020-04-13T17%3A00%3A00.000Z%22%2C%22end%22%3A%222020-07-12T17%3A00%3A00.000Z%22%2C%22label%22%3A%22Last% 2090%20days%22%2C%22endInclusive%22%3Afalse%7D%5D%7D%5D&facetOptions=%7B%7D&categoryFacets=%5B%5D&retrieveFirstSentences=true&timezone=Asia%2FTomsk&enableQuerySyntax=false&enableDuplicateFiltering=false&enableCollaborativeRating=false&debug=false&context=%7B%22tracking_id%22%3A%22HPESCXwxYkRD5BgcAAFnGlJ0AAAAY%22%2C%22active_features%22%3A%22DCS%2CDHFWS%2CSA2%2CpatchCoveoSearchToggle%2Csa2_product_focus_target_levels_toggle%2CtoggleCsr%2CtoggleSecBulletin %22%2C%22user_tracking_id%22%3A%22XwRimRD5AcgAAFl2OMkAAAAW%22%7D&allowQueriesWithoutKeywords=truecontext=%7B%22tracking_id%22%3A%22HPESCXwxYkRD5BgcAAFnGlJ0AAAAY%22%2C%22active_features%22%3A%22DCS%2CDHFWS%2CSA2%2CpatchCoveoSearchToggle%2Csa2_product_focus_target_levels_toggle%2CtoggleCsr%2CtoggleSecBulletin%22%2C%22user_tracking_id%22%3A%22XwRimRD5AcgAAFl2OMkAAAAW%22 %7D&allowQueriesWithoutKeywords=truecontext=%7B%22tracking_id%22%3A%22HPESCXwxYkRD5BgcAAFnGlJ0AAAAY%22%2C%22active_features%22%3A%22DCS%2CDHFWS%2CSA2%2CpatchCoveoSearchToggle%2Csa2_product_focus_target_levels_toggle%2CtoggleCsr%2CtoggleSecBulletin%22%2C%22user_tracking_id%22%3A%22XwRimRD5AcgAAFl2OMkAAAAW%22 %7D&allowQueriesWithoutKeywords=true

日志:

2020-07-13 17:30:03 [scrapy.core.engine] INFO: Spider opened
2020-07-13 17:30:03 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-07-13 17:30:03 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-07-13 17:30:03 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
2020-07-13 17:30:04 [scrapy.core.engine] DEBUG: Crawled (401) <POST https://platform.cloud.coveo.com/rest/search/v2?count=3> (referer: None)
2020-07-13 17:30:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://platform.cloud.coveo.com/rest/search/v2?count=3>: HTTP status code is not handled or not allowed
2020-07-13 17:30:04 [scrapy.core.engine] INFO: Closing spider (finished)
2020-07-13 17:30:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

标签: python-3.xscrapyrequest

解决方案


FormRequest的参数formdata 期望 dict作为 POST 参数的值。这就是 Scrapy 无法构建您的请求的原因。

根据您发布的数据,我使用URL 解码器进行解析,并使用以下 dict scrapy 能够发出请求:

my_data = {
    'actionsHistory': '[{"name":"Query","time":"\"2020-07-13T12:49:51.480Z\""},{"name":"Query","time":"\"2020-07-13T10:44:35.303Z\""},{"name":"Query","time":"\"2020-07-13T07:49:10.078Z\""},{"name":"Query","time":"\"2020-07-13T06:58:59.532Z\""},{"name":"Query","time":"\"2020-07-13T06:57:24.599Z\""},{"name":"Query","time":"\"2020-07-12T21:47:41.323Z\""},{"name":"Query","time":"\"2020-07-12T16:38:19.741Z\""},{"name":"Query","time":"\"2020-07-12T06:04:36.049Z\""},{"name":"Query","time":"\"2020-07-12T05:59:39.814Z\""},{"name":"Query","time":"\"2020-07-11T19:31:55.963Z\""},{"name":"Query","time":"\"2020-07-11T19:29:55.997Z\""},{"name":"Query","time":"\"2020-07-11T19:23:29.999Z\""},{"name":"Query","time":"\"2020-07-11T19:21:09.859Z\""},{"name":"Query","time":"\"2020-07-11T19:19:03.748Z\""},{"name":"Query","time":"\"2020-07-11T19:17:23.735Z\""},{"name":"Query","time":"\"2020-07-11T19:14:51.152Z\""},{"name":"Query","time":"\"2020-07-11T18:54:03.418Z\""},{"name":"Query","time":"\"2020-07-11T12:28:39.484Z\""},{"name":"Query","time":"\"2020-07-10T13:08:42.876Z\""},{"name":"Query","time":"\"2020-07-10T12:57:51.285Z\""}]',
    'referrer': 'https://support.hpe.com/hpesc/public/km/Security-Bulletin-Library',
    'visitorId': '33b0ede7-3274-486f-a31c-23ed3001ad91',
    'isGuestUser': 'false',
    'aq': '(@kmdoctypedetails==cv66000018) ((NOT @kmdoctype=cv60000001)) (@kmdocsecuritybulletin==4000003) (@kmdoclanguagecode==(cv1871440,cv1871463))',
    'cq': '(@source=="cdp-km-document-pro-h4-v2")',
    'searchHub': 'HPE-SecurityBulletins-Page',
    'locale': 'ru',
    'firstResult': '0',
    'numberOfResults': '25',
    'excerptLength': '500',
    'enableDidYouMean': 'true',
    'sortCriteria': 'relevancy',
    'queryFunctions': '[]',
    'rankingFunctions': '[]',
    'groupBy': '[{"field":"@kmdocsecuritybulletin","maximumNumberOfValues":20,"sortCriteria":"nosort","injectionDepth":1000,"completeFacetWithStandardValues":true,"allowedValues":["4000019","4000018","4000005","4000004","4000017","4000003","4000009","4000006","4000007","4000008","4000001","4000002","4000010","4000011","4000012","4000013","4000014","4000015","4000016"],"advancedQueryOverride":"(@kmdoctypedetails==cv66000018) ((NOT @kmdoctype=cv60000001)) (@kmdoclanguagecode==(cv1871440,cv1871463))","constantQueryOverride":"(@source==\"cdp-km-document-pro-h4-v2\")"},{"field":"@kmdoclanguagecode","maximumNumberOfValues":6,"sortCriteria":"Score","injectionDepth":1000,"completeFacetWithStandardValues":true,"allowedValues":["cv1871440","cv1871463"],"advancedQueryOverride":"(@kmdoctypedetails==cv66000018) ((NOT @kmdoctype=cv60000001)) (@kmdocsecuritybulletin==4000003)","constantQueryOverride":"(@source==\"cdp-km-document-pro-h4-v2\")"},{"field":"@kmdoctopissue","maximumNumberOfValues":6,"sortCriteria":"Score","injectionDepth":1000,"completeFacetWithStandardValues":true,"allowedValues":[],"advancedQueryOverride":"(@kmdoctypedetails==cv66000018) ((NOT @kmdoctype=cv60000001)) (@kmdocsecuritybulletin==4000003) (@kmdoclanguagecode==(cv1871440,cv1871463))","constantQueryOverride":"(@source==\"cdp-km-document-pro-h4-v2\") @kmdoctopissueexpirationdate>today"},{"field":"@kmdocdisclosurelevel","maximumNumberOfValues":6,"sortCriteria":"Score","injectionDepth":1000,"completeFacetWithStandardValues":true,"allowedValues":[]},{"field":"@hpescuniversaldate","completeFacetWithStandardValues":true,"maximumNumberOfValues":1,"sortCriteria":"nosort","generateAutomaticRanges":true,"advancedQueryOverride":"(@kmdoctypedetails==cv66000018) ((NOT @kmdoctype=cv60000001)) (@kmdocsecuritybulletin==4000003) (@kmdoclanguagecode==(cv1871440,cv1871463)) @uri","constantQueryOverride":"(@source==\"cdp-km-document-pro-h4-v2\") @hpescuniversaldate>1970/01/01@00:00:00"},{"field":"@hpescuniversaldate","completeFacetWithStandardValues":true,"maximumNumberOfValues":1,"sortCriteria":"nosort","generateAutomaticRanges":true,"constantQueryOverride":"(@source==\"cdp-km-document-pro-h4-v2\") @hpescuniversaldate>1970/01/01@00:00:00 @hpescuniversaldate>1970/01/01@00:00:00"},{"field":"@hpescuniversaldate","maximumNumberOfValues":5,"sortCriteria":"nosort","injectionDepth":1000,"completeFacetWithStandardValues":true,"rangeValues":[{"start":"1900-01-31T18:20:09.000Z","end":"2020-07-13T17:00:00.000Z","label":"All dates","endInclusive":false},{"start":"2020-07-05T17:00:00.000Z","end":"2020-07-13T17:00:00.000Z","label":"Last 7 days","endInclusive":false},{"start":"2020-06-12T17:00:00.000Z","end":"2020-07-13T17:00:00.000Z","label":"Last 30 days","endInclusive":false},{"start":"2020-05-13T17:00:00.000Z","end":"2020-07-13T17:00:00.000Z","label":"Last 60 days","endInclusive":false},{"start":"2020-04-13T17:00:00.000Z","end":"2020-07-12T17:00:00.000Z","label":"Last 90 days","endInclusive":false}]}]',
    'facetOptions': '{}',
    'categoryFacets': '[]',
    'retrieveFirstSentences': 'true',
    'timezone': 'Asia/Tomsk',
    'enableQuerySyntax': 'false',
    'enableDuplicateFiltering': 'false',
    'enableCollaborativeRating': 'false',
    'debug': 'false',
    'context': '{"tracking_id":"HPESCXwxYkRD5BgcAAFnGlJ0AAAAY","active_features":"DCS,DHFWS,SA2,patchCoveoSearchToggle,sa2_product_focus_target_levels_toggle,toggleCsr,toggleSecBulletin","user_tracking_id":"XwRimRD5AcgAAFl2OMkAAAAW"}',
    'allowQueriesWithoutKeywords': 'true',
}

然而回报是:

[scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <POST https://platform.cloud.coveo.com/rest/search/v2?count=3>

为了继续您的抓取,您需要禁用该ROBOTSTXT_OBEY 设置


推荐阅读