python-3.x - 遵循 IG 抓取教程并停留在 XPath/其他问题上
问题描述
我一直在这里学习本教程:https ://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c
当我尝试创建一个更简单的函数“insta_details”版本时,它会获得 Instagram 照片帖子的点赞和评论,我似乎无法判断代码出了什么问题。我认为我错误地使用了 xpath(第一次),但错误消息要求“NoSuchElementException”。
from selenium.webdriver import Chrome
def insta_details(urls):
browser = Chrome()
post_details = []
for link in urls:
browser.get(link)
likes = browser.find_element_by_partial_link_text('likes').text
age = browser.find_element_by_css_selector('a time').text
xpath_comment = '//*[@id="react-root"]/section/main/div/div/article/div[2]/div[1]/ul/li[1]/div/div/div'
comment = browser.find_element_by_xpath(xpath_comment).text
insta_link = link.replace('https://www.instagram.com/p', '')
post_details.append({'link': insta_link,'likes/views': likes,'age': age, 'comment': comment})
return post_details
urls = ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']
insta_details(urls)
错误信息:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"likes"}
从教程中复制和粘贴代码对我来说还没有用。我是错误地调用了该函数还是代码中还有其他内容?
解决方案
查看教程,您的代码似乎不完整。
在这里,试试这个:
import time
import re
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import Chrome
def find_mentions_or_hashtags(comment, pattern):
mentions = re.findall(pattern, comment)
if (len(mentions) > 1) & (len(mentions) != 1):
return mentions
elif len(mentions) == 1:
return mentions[0]
else:
return ""
def insta_link_details(url):
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = Chrome(options=chrome_options)
browser.get(url)
try:
# This captures the standard like count.
likes = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div/article/
div[3]/section[2]/div/div/button/span""").text.split()[0]
post_type = 'photo'
except:
# This captures the like count for videos which is stored
likes = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div/article/
div[3]/section[2]/div/span/span""").text.split()[0]
post_type = 'video'
age = browser.find_element_by_css_selector('a time').text
comment = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div[1]/article/
div[3]/div[1]/ul/div/li/div/div/div[2]/span""").text
hashtags = find_mentions_or_hashtags(comment, '#[A-Za-z]+')
mentions = find_mentions_or_hashtags(comment, '@[A-Za-z]+')
post_details = {'link': url, 'type': post_type, 'likes/views': likes,
'age': age, 'comment': comment, 'hashtags': hashtags,
'mentions': mentions}
time.sleep(10)
return post_details
for url in ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']:
print(insta_link_details(url))
输出:
{'link': 'https://www.instagram.com/p/CFdNu1lnCmm/', 'type': 'photo', 'likes/views': '4', 'age': '6h', 'comment': 'Natural ingredients for natural skincare is the best way to go, with:\n\nThe Body Shop @thebodyshopaust\n☘️The Beauty Chef @thebeautychef\n\nWalk your body to a happier, healthier you with The Body Shop’s fair trade, high quality products. Be a powerhouse of digestive health with The Beauty Chef’s ingenious food supplements. Even at our busiest, there’s always a way to take care of our health. \n\n5% rebate on all online purchases with #sosure. T&Cs apply. All rates for limited time only.', 'hashtags': '#sosure', 'mentions': ['@thebodyshopaust', '@thebeautychef']}
{'link': 'https://www.instagram.com/p/CFYR2OtHDbD/', 'type': 'photo', 'likes/views': '10', 'age': '2 DAYS AGO', 'comment': 'The weather can dry out your skin and hair this season, and there’s no reason to suffer through more when there’s so much going on! Look better, feel better and brush better with these great offers for haircare, skin rejuvenation and beauty Find 5% rewards for purchases at:\n\n Shaver Shop\n Fresh Fragrances\n Happy Hair Brush\n & many more online at our website bio !\n\nSoSure T&Cs apply. All rates for limited time only.\n.\n.\n.\n#sosure #sosureapp #haircare #skincare #perfume #beauty #healthylifestyle #shavershop #freshfragrances #happyhairbrush #onlineshopping #deals #melbournelifestyle #australia #onlinedeals', 'hashtags': ['#sosure', '#sosureapp', '#haircare', '#skincare', '#perfume', '#beauty', '#healthylifestyle', '#shavershop', '#freshfragrances', '#happyhairbrush', '#onlineshopping', '#deals', '#melbournelifestyle', '#australia', '#onlinedeals'], 'mentions': ''}
推荐阅读
- javascript - 无法读取 javascript 中未定义的属性“clearRect”
- vue.js - 无法添加 vue-cli-plugin-cordova
- ruby-on-rails - Rails 嵌套查询有很多通过关联
- python - 在运行时更改 pytest 的实时日志级别
- python - Pyspark 命令无法识别(Windows)
- amazon-web-services - 用于获取 lambda 调用历史记录的 AWS API
- python - 在 selenium 中找不到/单击特定按钮(instagram bot)
- apache-flink - 在 Flink Kafka Producer 中发送密钥
- javascript - 在 v-for 中切换自定义音频播放/暂停按钮
- r - 导数数 =/= R 中的初始参数数误差