python-3.x - 遵循 IG 抓取教程并停留在 XPath/其他问题上

问题描述

我一直在这里学习本教程：https ://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c

当我尝试创建一个更简单的函数“insta_details”版本时，它会获得 Instagram 照片帖子的点赞和评论，我似乎无法判断代码出了什么问题。我认为我错误地使用了 xpath（第一次），但错误消息要求“NoSuchElementException”。

from selenium.webdriver import Chrome


def insta_details(urls):
    browser = Chrome()
    post_details = []
    for link in urls:
        browser.get(link)
        likes = browser.find_element_by_partial_link_text('likes').text
        age = browser.find_element_by_css_selector('a time').text
        xpath_comment = '//*[@id="react-root"]/section/main/div/div/article/div[2]/div[1]/ul/li[1]/div/div/div'
        comment = browser.find_element_by_xpath(xpath_comment).text
        insta_link = link.replace('https://www.instagram.com/p', '')
        post_details.append({'link': insta_link,'likes/views': likes,'age': age, 'comment': comment})
    return post_details


urls = ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']
insta_details(urls)

错误信息：

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"likes"}

从教程中复制和粘贴代码对我来说还没有用。我是错误地调用了该函数还是代码中还有其他内容？

标签： python-3.xseleniumxpathweb-scrapinginstagram

查看教程，您的代码似乎不完整。

在这里，试试这个：

import time
import re
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import Chrome


def find_mentions_or_hashtags(comment, pattern):
    mentions = re.findall(pattern, comment)
    if (len(mentions) > 1) & (len(mentions) != 1):
        return mentions
    elif len(mentions) == 1:
        return mentions[0]
    else:
        return ""


def insta_link_details(url):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    browser = Chrome(options=chrome_options)
    browser.get(url)
    try:
        # This captures the standard like count.
        likes = browser.find_element_by_xpath(
            """/html/body/div[1]/section/main/div/div/article/
                div[3]/section[2]/div/div/button/span""").text.split()[0]
        post_type = 'photo'
    except:
        # This captures the like count for videos which is stored
        likes = browser.find_element_by_xpath(
            """/html/body/div[1]/section/main/div/div/article/
                div[3]/section[2]/div/span/span""").text.split()[0]
        post_type = 'video'
    age = browser.find_element_by_css_selector('a time').text
    comment = browser.find_element_by_xpath(
        """/html/body/div[1]/section/main/div/div[1]/article/
        div[3]/div[1]/ul/div/li/div/div/div[2]/span""").text

    hashtags = find_mentions_or_hashtags(comment, '#[A-Za-z]+')
    mentions = find_mentions_or_hashtags(comment, '@[A-Za-z]+')
    post_details = {'link': url, 'type': post_type, 'likes/views': likes,
                    'age': age, 'comment': comment, 'hashtags': hashtags,
                    'mentions': mentions}
    time.sleep(10)
    return post_details


for url in ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']:
    print(insta_link_details(url))

输出：

{'link': 'https://www.instagram.com/p/CFdNu1lnCmm/', 'type': 'photo', 'likes/views': '4', 'age': '6h', 'comment': 'Natural ingredients for natural skincare is the best way to go, with:\n\nThe Body Shop @thebodyshopaust\n☘️The Beauty Chef @thebeautychef\n\nWalk your body to a happier, healthier you with The Body Shop’s fair trade, high quality products. Be a powerhouse of digestive health with The Beauty Chef’s ingenious food supplements.  Even at our busiest, there’s always a way to take care of our health. \n\n5% rebate on all online purchases with #sosure. T&Cs apply. All rates for limited time only.', 'hashtags': '#sosure', 'mentions': ['@thebodyshopaust', '@thebeautychef']}
{'link': 'https://www.instagram.com/p/CFYR2OtHDbD/', 'type': 'photo', 'likes/views': '10', 'age': '2 DAYS AGO', 'comment': 'The weather can dry out your skin and hair this season, and there’s no reason to suffer through more when there’s so much going on!  Look better, feel better and brush better with these great offers for haircare, skin rejuvenation and beauty  Find 5% rewards for purchases at:\n\n Shaver Shop\n Fresh Fragrances\n Happy Hair Brush\n & many more online at our website bio !\n\nSoSure T&Cs apply. All rates for limited time only.\n.\n.\n.\n#sosure #sosureapp #haircare #skincare #perfume #beauty #healthylifestyle #shavershop #freshfragrances #happyhairbrush #onlineshopping #deals #melbournelifestyle #australia #onlinedeals', 'hashtags': ['#sosure', '#sosureapp', '#haircare', '#skincare', '#perfume', '#beauty', '#healthylifestyle', '#shavershop', '#freshfragrances', '#happyhairbrush', '#onlineshopping', '#deals', '#melbournelifestyle', '#australia', '#onlinedeals'], 'mentions': ''}

python-3.x - 遵循 IG 抓取教程并停留在 XPath/其他问题上

问题描述

解决方案

推荐阅读