python - 缩短的链接不适用于 BeautifulSoup Python
问题描述
此代码完全可以从站点获取信息:
url = 'https://www.vogue.com/article/mamma-mia-2-here-we-go-again-review?mbid=social_twitter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
title = soup.find("meta", {"name": "twitter:title"})
title2 = soup.find("meta", property="og:title")
title3 = soup.find("meta", property="og:description")
print("TITLE: "+str(title['content']))
print("TITLE2: "+str(title2['content']))
print("TITLE3: "+str(title3['content']))
但是,当我用这个缩短的链接替换 url 时,它会返回:
print("TITLE: "+str(title['content']))
TypeError: 'NoneType' object has no attribute '__getitem__'
解决方案
url-shortener 发送元刷新以重定向到所需的页面。这段代码应该有帮助:
from bs4 import BeautifulSoup
import requests
import re
shortened_url = '<YOUR SHORTENED URL>'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
response = requests.get(shortened_url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
while True:
# is meta refresh there?
if soup.select_one('meta[http-equiv=refresh]'):
refresh_url = re.search(r'url=(.*)', soup.select_one('meta[http-equiv=refresh]')['content'], flags=re.I)[1]
response = requests.get(refresh_url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
else:
break
title = soup.find("meta", {"name": "twitter:title"})
title2 = soup.find("meta", property="og:title")
title3 = soup.find("meta", property="og:description")
print("TITLE: "+str(title['content']))
print("TITLE2: "+str(title2['content']))
print("TITLE3: "+str(title3['content']))
印刷:
TITLE: Mamma Mia! Here We Go Again Is the Only Good Thing About This Summer - Vogue
TITLE2: Mamma Mia! Here We Go Again Is the Only Good Thing About This Summer
TITLE3: Is it possible to change your country of origin to a movie sequel?
推荐阅读
- excel - 我的 vba 代码运行很慢。运行时excel冻结
- google-play-games - 将 Play 游戏服务 SDK 添加到您的生产 APK 以使用 API
- android - 在android中使用gstreamer和cmake?
- redis - SAML 2 SSO AUTH COOKIE ID 移动到使用 itfoxtec-identity-saml2 分发
- pandas - 具有第一个非空唯一值的 groupby agg
- java - 迭代复合材料
- python - 如何将html中的多行段落合并为一个?
- scanf - 在c编程中输入时出错
- internet-explorer - 在 ie11 上运行 Nextjs 的配置
- amazon-web-services - 加载经常变化的源列