放入列表时抛出错误,python,selenium"/>

首页 > 解决方案 > 转动放入列表时抛出错误

问题描述

我目前正在制作一个 youtube 网络抓取工具以获取评论。

我想取消评论并将它们放入数据框中。我的代码只能打印文本,但我无法将文本放入数据框中。当我检查输出的类型时,它是一个 ' <class 'str'> ' 我可以通过这段代码获取文本:

    try:
        # Extract the elements storing the usernames and comments.
        username_elems = driver.find_elements_by_xpath('//*[@id="author-text"]')
        comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')       
    except exceptions.NoSuchElementException:
        error = "Error: Double check selector OR "
        error += "element may not yet be on the screen at the time of the find operation"
        print(error)
    for com_text in comment_elems:
        print(com_text.text)

如果我在函数结束时通过此代码检查文本。

    for com_text in comment_elems:
        print(type(com_text.text)

那么结果是<class 'str'>。然后我无法将其放入数据框中。

当我尝试将此 <class 'str'> 对象放入数据框中时,出现错误:TypeError: 'WebElement' object does not support item assignment

这是我尝试将文本放入数据框中时使用的代码:

    for username, comment in zip(username_elems, comment_elems):
        comment_section['comment'] = comment.text
        data.append(comment_section)

我希望有一种方法可以将 <class 'str'> 对象转换为常规字符串类型,或者如果我可以采取另一个步骤从对象中提取文本。

这是我的完整代码

def gitscrape(url):
    # Note: replace argument with absolute path to the driver executable.
    driver = webdriver.Chrome('chromedriver/windows/chromedriver.exe')

    # Navigates to the URL, maximizes the current window, and
    # then suspends execution for (at least) 5 seconds (this gives time for the page to load).
    driver.get(url)
    driver.maximize_window()
    time.sleep(5)
    
    #empty subjects
    comment_section =[]
    comment_data = []
    
    try:
        # Extract the elements storing the video title and
        # comment section.
        title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
        comment_section = driver.find_element_by_xpath('//*[@id="comments"]')
    except exceptions.NoSuchElementException:
        # Note: Youtube may have changed their HTML layouts for videos, so raise an error for sanity sake in case the
        # elements provided cannot be found anymore.
        error = "Error: Double check selector OR "
        error += "element may not yet be on the screen at the time of the find operation"
        print(error)

    # Scroll into view the comment section, then allow some time
    # for everything to be loaded as necessary.
    driver.execute_script("arguments[0].scrollIntoView();", comment_section)
    time.sleep(7)

    # Scroll all the way down to the bottom in order to get all the
    # elements loaded (since Youtube dynamically loads them).
    last_height = driver.execute_script("return document.documentElement.scrollHeight")

    while True:
        # Scroll down 'til "next load".
        driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

        # Wait to load everything thus far.
        time.sleep(2)

        # Calculate new scroll height and compare with last scroll height.
        new_height = driver.execute_script("return document.documentElement.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

    # One last scroll just in case.
    driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

    try:
        # Extract the elements storing the usernames and comments.
        username_elems = driver.find_elements_by_xpath('//*[@id="author-text"]')
        comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')       
    except exceptions.NoSuchElementException:
        error = "Error: Double check selector OR "
        error += "element may not yet be on the screen at the time of the find operation"
        print(error)
        
#     for com_text in comment_elems:
#         print(type(com_text.text)
#         data.append(comment_section)

    for username, comment in zip(username_elems, comment_elems):
        comment_section['comment'] = comment.text
        data.append(comment_section)
        
    video1_comments = pd.DataFrame(data)

标签: pythonselenium

解决方案


您的错误发生在行中comment_section['comment'] = comment.text。您在文本中写道,当您尝试将字符串放入数据框时遇到该错误,但数据框既不是comment_section也不comment是数据框。在您的标题中,您写道将字符串添加到引发错误的列表中,但comment_section也不是列表(如果它在哪里,语法将没有任何意义)。编码对您实际在做什么非常敏感,因此拥有数据框或列表会产生很大的不同。

comment_section类型实际上是什么?如果您向上滚动代码,则最后的分配如下:comment_section = driver.find_element_by_xpath('//*[@id="comments"]')实际上comment_section既不是数据框也不是列表,而是网络元素!现在你得到的错误也很有意义,它说TypeError: 'WebElement' object does not support item assignment并且确实你正在尝试分配comment.text给 WebElement 的commentcomment_section,但 WebElement 不支持这一点。

您可以通过不覆盖comment_sectin但使用不同的名称来修复此问题。


推荐阅读