首页 > 解决方案 > 使用 scrapy 重定向 strava.com 中的授权问题。日志说 strava 将我从 /login 重定向到 /dashboard 到 /login

问题描述

我真的需要你的帮助:已经尝试了一切!目标 -使用 scrapy授权https://www.strava.com/login 。

那是我的代码:

import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser


class StravaSpider(scrapy.Spider):
   name = 'strava'
   start_urls = ('https://www.strava.com/dashboard',)
   #handle_httpstatus_list = [301, 302]


   def parse(self, response):
       token = response.xpath('//*[@name="csrf-token"]/@content').get()
       return FormRequest.from_response(response,
                                        formdata={
                                            'authenticity_token': token,
                                            'plan': "",
                                            'email': 'login',
                                            'password': 'password'},
                                        #dont_filter=True,
                                        #meta={'dont_redirect': True, 'handle_httpstatus_list': [302]},
                                        callback=self.scrape_page)

   def scrape_page(self, response):
       print('okkkk', '\n\n\n\n')
       open_in_browser(response)

问题在于重定向:

2020-08-16 20:18:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.strava.com/dashboard> from <POST https://www.strava.com/session>
2020-08-16 20:18:49 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36
2020-08-16 20:18:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.strava.com/login> from <GET https://www.strava.com/dashboard>
2020-08-16 20:18:49 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36
2020-08-16 20:18:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.strava.com/dashboard> from <GET https://www.strava.com/login>
2020-08-16 20:18:50 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://www.strava.com/dashboard> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2020-08-16 20:18:50 [scrapy.core.engine] INFO: Closing spider (finished)

post请求的Request Headers中的Form数据为utf8=%E2%9C%93&authenticity_token=W28zQ9XWLK7oktgDzUj0kCozODXk2bJQLAqPihwEJ8gwj1VDKtA7c5AWwTw0OUovnyAZkcXiNdF2Zt4AsNOIUQ%3D%3D&plan=&email=&password=–</>p

如果我取消注释don't filter=True并让它handle_httpstatus_list = [302]返回给我一个非常有趣的页面:

<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"></head><body>You are being <a href="https://www.strava.com/dashboard">redirected</a>.</body></html>

如果我在密码或登录时出错 - 它真的会进入回调函数并返回一个页面,其中显示“用户名或密码不匹配”。请再试一次。' 这意味着我的授权有效,但scrapy 没有跟随正确的页面。

我关闭了重复过滤器,添加了handle_httpstatus_list,在设置中添加了scrapy-redirect ......没有任何效果。请不要 bs4 或 selenium - 我已经和他们一起做了这个程序,现在我只需要scrapy和这个授权......让我哭了

标签: pythonredirectscrapyweb-crawlerstrava

解决方案


推荐阅读