python - 硒饼干处理
问题描述
所以我可能用了整个错误的方法来解决这个问题,如果是这样,请引导我走向更好的道路。我正在尝试从我的大学网页上获取所有公告,并让它们由不和谐的机器人打印出来(我可以成功地做到这一点)。
但是,我在登录后处理 cookie 感到很困惑。
# scrapeCSUP.py
import pickle
import time
import pprint
import requests
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as bs
opts = Options()
opts.headless = True
assert opts.headless
main_url = "https://cs.up.ac.za/courses/COS132"
browser = Chrome(options=opts)
page = requests.get(main_url)
soup = bs(page.content, 'html.parser')
def save_cookies(driver, location):
pickle.dump(browser.get_cookies(), open(location, 'wb'))
def load_cookies(driver, location, url = None):
cookies = pickle.load(open(location, 'rb'))
browser.delete_all_cookies()
url = "https://cs.up.ac.za/courses/COS132" if url is None else url
browser.get(main_url)
for cookie in cookies:
browser.add_cookie(cookie)
def user_login():
browser.get("https://cs.up.ac.za/login?next=%2Fcourses%2FCOS132")
browser.find_element_by_xpath('/html/body/div/div/div[3]/div/form/table/tbody/tr[1]/td/input').send_keys(
'username')
browser.find_element_by_xpath('/html/body/div/div/div[3]/div/form/table/tbody/tr[2]/td/input').send_keys(
'password')
checkbox = browser.find_element_by_xpath('/html/body/div/div/div[3]/div/form/table/tbody/tr[4]/td[2]/input')
if not checkbox.is_selected():
checkbox.click()
browser.find_element_by_xpath('/html/body/div/div/div[3]/div/form/table/tbody/tr[5]/td[2]/input[3]').click()
save_cookies(browser, 'cookies.txt')
print("Logged in successfully")
time.sleep(5)
pprint.pprint(browser.get_cookies())
browser.quit()
def login_w_cookies():
load_cookies(browser, 'cookies.txt', main_url)
browser.get(main_url)
time.sleep(5)
pprint.pprint(browser.get_cookies())
def announcement_printer():
whole_content = soup.find(class_='siteContainer')
announcements = whole_content.find_all('div', class_='left')
for announcement in announcements:
print(announcement, end='\n' * 2)
browser.get(main_url)
#user_login()
login_w_cookies()
announcement_printer()
print("========================================\n")
我跑来user_login()
保存cookie然后我执行login_w_cookies()
,但是我没有正确加载cookie,这让我很难过。
Traceback (most recent call last):
File "E:/DiscBot/scrapeCSUP.py", line 75, in <module>
login_w_cookies()
File "E:/DiscBot/scrapeCSUP.py", line 57, in login_w_cookies
load_cookies(browser, 'cookies.txt', main_url)
File "E:/DiscBot/scrapeCSUP.py", line 33, in load_cookies
browser.add_cookie(cookie)
File "E:\DiscBot\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 894, in add_cookie
self.execute(Command.ADD_COOKIE, {'cookie': cookie_dict})
File "E:\DiscBot\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "E:\DiscBot\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid 'expiry'
(Session info: headless chrome=81.0.4044.113)
解决方案
在load
-function 添加以下行:
...
for cookie in cookies:
if isinstance(cookie.get("expiry"), float):
cookie["expiry"] = int(cookie["expiry"])
browser.add_cookie(cookie)
我希望它对你有帮助!
推荐阅读
- java - 使用BeanPostProcessor创建代理,但peoxy类字段为空
- javascript - TypeError: moment 不是下一个 js 中的函数?
- c++ - c++ const 参数传递:为什么编译器不会自动设置为按引用传递
- java - 使用 json 文件解析地图。无法弄清楚读取操作的语法
- crystal-reports - 水晶报表删除交易的最后一行
- javascript - 使用嵌套函数时无法读取未定义的属性“推送”
- javascript - 基线在 react-native 中没有按预期工作
- java - 如何修复 for 循环中的 findViewbyId()“尝试调用虚拟方法”错误?
- python - 在测量数据上拟合高斯函数
- haskell - (Haskell) gi-gtk 在按钮回调中设置图像