java - Call Ajax using HtmlUnit
问题描述
I want to crawl web page, this page has a download button, when I press it current page show me download progress in title and then show me download link which can be pressed. I think its done via Ajax because I can see some in developer console -> Network ->XHR
This my code to crawl site
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage("https://9xbuddy.com/process?url=https://www.fembed.com/v/6mv22g3qfsdfsd");
// final ScriptResult scriptResult = page.executeJavaScript("beacon.js");
webClient.waitForBackgroundJavaScript(10000);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
But this code return me page which I get after button click and don't load Ajax. I know which Ajax requests were made by site, is it any way to manually call Ajax requests?
解决方案
您可以使用 HtmlUnit 手动构建 Ajax 调用,如果您发现 Google Chrome 控制台不够用,可以使用 Fiddler 等工具。一旦确定了 HTTP 调用,就可以使用 HTMLUnit 重建它,如下所示
URL url = new URL(
"http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);
requestSettings.setAdditionalHeader("Accept", "*/*");
requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
Page page = webClient.getPage(requestSettings);
System.out.println(page.getWebResponse().getContentAsString());
推荐阅读
- docker - Docker部署中的Heroku多个命令
- c# - 无法从其他页面访问和重用控件
- opengl-es - 如何在 mapbox GL JS 自定义样式层中使用 u_time
- python - 如何使用 Python 在 Mac OS 上将 docx 转换为 pdf?
- asp.net - ASP.Net - VB - Gridview 和 5 位和 6 位数字排序
- javascript - jquery在div上聚合onclick
- javascript - 集合操作中的 Mongo 条件聚合
- c# - 如何在 Sharpsvn 中的特定修订号后获取消息的日志
- python - Python中的多重过滤DataFrame以使用PyQt5应用程序
- c++ - 更新 QTreeView 和 QListView 中的项目