r - 通过 R 中的 phantomjs 下载多个 URL。如何遍历它们？

问题描述

我正在尝试使用 pahtomjs 通过 R 下载几个 URL，因为网站包含 javascript。我可以使用下面最后一个代码框中的代码下载单个网页，但我需要它才能为多个网站工作。为了找到我使用本教程的代码：http: //flovv.github.io/Scrape-JS-Sites/ 我有一个 url 字符串，urls[i]，还有一个用于保存它们的目的地 urls[一世]。以前我试过：

for(i in seq_along(urls)) {
 download.file(urls[i], destinations[i], mode="wb")
}

但是，这不起作用，因为网站包含 javascript。

我试图遵循这篇文章中的答案，但很困惑：Scraping multiple URLs by looping in PhantomJS

writeLines("var url = 'url link';
var page = new WebPage();
var fs = require('fs');

page.open(url, function (status) {
        just_wait();
});

function just_wait() {
    setTimeout(function() {
               fs.write('1.html', page.content, 'w');
            phantom.exit();
    }, 2500);
}
", con = "scrape.js")

js_scrape <- function(url = "url link", 
                      js_path = "scrape.js", 
                      phantompath = "phantomjs"){
  lines <- readLines(js_path)
  lines[1] <- paste0("var url ='", url ,"';")
  writeLines(lines, js_path)

  command = paste(phantompath, js_path, sep = " ")
  system(command)

}

js_scrape()

请帮我！

标签： rloopsurlweb-scrapingphantomjs

r - 通过 R 中的 phantomjs 下载多个 URL。如何遍历它们？

问题描述

解决方案

推荐阅读