r - CSS 选择器 Goodreads 用户评分
问题描述
网址:https ://www.goodreads.com/book/show/27841061-nevernight 目标:提取个人用户评分
当我检查用户评分时,我看到了这一点。
<span class="staticStars notranslate" title="did not like it">
如果我可以提取标题,我可以映射评级。
rate_map = {'did not like it': 1,
'it was ok': 2,
'liked it': 3,
'really liked it': 4,
'it was amazing': 5}
url = 'https://www.goodreads.com/book/show/27841061-nevernight'
gr_list <- read_html(url)
gr_list %>% html_node('.staticStars .notranslate') %>%
html_attr('title')
我得到的代码结果是“NA”。
谁能告诉我我做错了什么?谢谢。
解决方案
css 选择器.staticStars .notranslate
意味着您正在寻找一个具有notranslate
嵌套在具有类的节点中的类的节点staticStars
。也就是说,它会匹配这样的东西
<span class="staticStars"><span class="notranslate">foo</span></span>
如果要匹配具有两个类的节点,则需要确保选择器之间没有空格。你可以做
url <- 'https://www.goodreads.com/book/show/27841061-nevernight'
gr_list <- read_html(url)
gr_list %>% html_nodes('.staticStars.notranslate') %>%
html_attr('title')
# [1] NA NA "did not like it"
# [4] "did not like it" "it was amazing" "it was amazing"
# [7] "it was amazing" "it was amazing" "it was amazing"
# [10] "did not like it" "it was amazing" "really liked it"
# [13] "did not like it" "it was amazing" "it was amazing"
# [16] "it was amazing" "did not like it" "it was amazing"
# [19] "it was amazing" "it was amazing" "it was amazing"
# [22] "it was amazing" "it was amazing" "it was amazing"
# [25] "it was amazing" "it was amazing" "it was amazing"
# [28] "it was amazing" "it was amazing" "liked it"
推荐阅读
- google-kubernetes-engine - GKE kubernetes 容器标准输出日志格式已更改
- ios - GMSMapView 城市标签消失
- git - Linux 最好的加密 git 凭证助手是什么?
- ios - Swift 将值从 main 传递到另一个带有故事板 segue 的视图控制器
- templates - 如何创建我的网站的管理面板?专门使用引导模板
- facebook - Facebook Graph API Ad Insights doesn't match Business Manager values
- ruby-on-rails - Rails 显示直通关系的结果
- javascript - JQuery/JavaScript - 时间问题
- sql - MS Sql 不在表中
- python - 在 python 中将 BytesIO 添加到 BytesIO tar.gz