r - R Scraping IMDB:处理缺失信息的更好方法?
问题描述
我正在关注此网站以从 IMDB 获取信息:https ://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on -知识/
但是,IMDB 中缺少一些数据。该网站建议进行目视检查并编写如下函数:
for (i in c(39,73,80,89)){
a<-metascore_data[1:(i-1)]
b<-metascore_data[i:length(metascore_data)]
metascore_data<-append(a,list("NA"))
metascore_data<-append(metascore_data,b)
}
我想知道是否有更好的方法来以编程方式处理这个问题?
解决方案
以下对我有用:
library(rvest)
URL <- 'https://www.imdb.com/search/title/?title_type=feature&online_availability=US/IMDbTV&start=1251&ref_=adv_nxt'
webpage <- read_html(URL)
genres <- webpage %>%
html_nodes('span.genre') %>%
html_text() %>%
trimws()
这将返回 50 个值:
genres
# [1] "Comedy, Romance" "Action, Crime, Drama"
# [3] "Action, Horror, Sci-Fi" "Action, Adventure, Thriller"
# [5] "Adventure, Comedy, Family" "Comedy"
# [7] "Action, Adventure, Thriller" "Comedy, Drama, Romance"
# [9] "Comedy" "Comedy"
#[11] "Action, Adventure, Drama" "Action, Thriller"
#[13] "Action, Crime, Thriller" "Mystery, Thriller"
#[15] "Crime, Drama, Thriller" "Drama, Horror"
#[17] "Animation, Drama, War" "Drama, Thriller"
#[19] "Action, Crime, Drama" "Drama, Sci-Fi"
#[21] "Adventure, Comedy, Family" "Crime, Drama"
#[23] "Action, Adventure, Thriller" "Action, Adventure, Sci-Fi"
#[25] "Thriller" "Comedy, Crime"
#[27] "Comedy, Romance" "Action, Biography, Drama"
#[29] "Adventure, Comedy" "Crime, Drama, Thriller"
#[31] "Drama, Sci-Fi, Thriller" "Comedy, Romance"
#[33] "Action, Drama, Thriller" "Action, Adventure, Sci-Fi"
#[35] "Action, Crime, Drama" "Action, Adventure, Drama"
#[37] "Action, Thriller" "Action, Drama, War"
#[39] "Drama, Sci-Fi, Thriller" "Animation, Adventure, Family"
#[41] "Drama, Romance" "Action, Drama, Fantasy"
#[43] "Action, Adventure, Fantasy" "Comedy, Crime, Drama"
#[45] "Action, Crime, Drama" "Action, Adventure, Sci-Fi"
#[47] "Drama, Romance" "Animation, Family, Fantasy"
#[49] "Action, Adventure, Fantasy" "Mystery, Thriller"
推荐阅读
- python - 调试多线程 Python 应用程序时,如何阻止 VS Code 从一个线程跳转到另一个线程?
- c# - 在 Azure Service Fabric 中启动有状态服务时加载和使用数据
- xamarin.forms - DisplayAlert 在 xamarin 中的函数中不起作用
- php - 我应该如何有效地在所有正在运行的 PHP 脚本之间进行通信?
- c++ - 类似函数的宏不接受嵌套的#define
- server-side-rendering - 预渲染本身是否意味着在 SSR 中生成 HTML 内容?
- amazon-web-services - ElasticBeanstalk 和 CodegeIgniter
- c++ - CPR 未解决的外部符号
- amazon-web-services - 用于获取“AWS CLI 导出”的 AWS Cli 命令
- c# - 允许方法参数中的任何泛型类型