首页 > 解决方案 > CSS 选择器 Goodreads 用户评分

问题描述

网址:https ://www.goodreads.com/book/show/27841061-nevernight 目标:提取个人用户评分

当我检查用户评分时,我看到了这一点。

<span class="staticStars notranslate" title="did not like it">

如果我可以提取标题,我可以映射评级。

rate_map = {'did not like it': 1,
'it was ok': 2,
'liked it': 3,
'really liked it': 4,
'it was amazing': 5}

url = 'https://www.goodreads.com/book/show/27841061-nevernight'
gr_list <- read_html(url)
gr_list %>%  html_node('.staticStars .notranslate') %>%  
  html_attr('title')

我得到的代码结果是“NA”。

谁能告诉我我做错了什么?谢谢。

标签: rweb-scrapingcss-selectorsrvest

解决方案


css 选择器.staticStars .notranslate意味着您正在寻找一个具有notranslate嵌套在具有类的节点中的类的节点staticStars。也就是说,它会匹配这样的东西

<span class="staticStars"><span class="notranslate">foo</span></span>

如果要匹配具有两个类的节点,则需要确保选择器之间没有空格。你可以做

url <- 'https://www.goodreads.com/book/show/27841061-nevernight'
gr_list <- read_html(url)
gr_list %>%  html_nodes('.staticStars.notranslate') %>% 
  html_attr('title')

#  [1] NA                NA                "did not like it"
#  [4] "did not like it" "it was amazing"  "it was amazing" 
#  [7] "it was amazing"  "it was amazing"  "it was amazing" 
# [10] "did not like it" "it was amazing"  "really liked it"
# [13] "did not like it" "it was amazing"  "it was amazing" 
# [16] "it was amazing"  "did not like it" "it was amazing" 
# [19] "it was amazing"  "it was amazing"  "it was amazing" 
# [22] "it was amazing"  "it was amazing"  "it was amazing" 
# [25] "it was amazing"  "it was amazing"  "it was amazing" 
# [28] "it was amazing"  "it was amazing"  "liked it" 

推荐阅读