java - 获得响应后如何开始解析 HTML 页面而不等待 JavaScript (Java HtmlUnit)?
问题描述
我尝试通过Java HtmlUnit获取页面,但这不是一个快速的过程,经过调查,我已经明白它的发生是因为我们应该等待加载并应用 js,但对我来说,这没有必要,因为我收到后有 HTML来自服务器的响应。
我想拥有什么?
我希望有机会在收到响应后立即开始解析(无需加载 js 和 CSS iframe 等外部资源,只需纯 HTML 字符串)。可能吗?
我的代码示例:
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
//get page (use this site just for example)
final HtmlPage page = webClient.getPage("https://godaddy.com/");
//after line of code above, I have long response
final DomNode articleNode = page.querySelector("body");
final String articleText = articleNode.getTextContent();
} catch (Exception e){
e.printStackTrace();
}
还有我的 htmlUnit 输出日志:
2018-12-08 16:01:56.049 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:72403] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.061 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:88489] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.063 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:89596] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.064 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:89920] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.064 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:90001] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.064 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:90070] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.072 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:98871] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.072 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:98902] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.073 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:99516] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.073 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:99547] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.147 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:188511] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.147 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:188567] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.180 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:221027] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.222 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:1512] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.225 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:3780] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.240 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:20465] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.241 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/ wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:20654] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, < ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, < URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.441 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/cms/sales/ css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:115341] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, < ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, < UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.441 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/cms/sales/ css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:116052] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, < ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, < UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.445 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/cms/sales/ css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:118964] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, < ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, < UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.445 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/cms/sales/ css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:119073] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, < ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, < UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.560 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS error: 'https://ua.godaddy.com/assets/cms/sales/ css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:226900] Error in media list. (Invalid token "screen". Was expecting one of: <S>, "(".)
2018-12-08 16:01:56.560 WARN 19846 --- [ scheduling-1] c.g.htmlunit.DefaultCssErrorHandler : CSS warning: 'https://ua.godaddy.com/assets/cms/ sales/css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:226900] Ignoring the whole rule.
2018-12-08 16:01:56.771 WARN 19846 --- [ scheduling-1] c.g.htmlunit.html.HtmlScript : Script is not JavaScript (type: application/ ld+json, language: ). Skipping execution.
2018-12-08 16:02:01.040 WARN 19846 --- [ scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl : Obsolete content type encountered: 'text/ javascript'.
2018-12-08 16:02:01.410 WARN 19846 --- [ scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl : Obsolete content type encountered: 'text/ javascript'.
2018-12-08 16:02:01.857 WARN 19846 --- [ scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl : Obsolete content type encountered: 'application/ x-javascript'.
2018-12-08 16:02:02.787 INFO 19846 --- [ scheduling-1] c.g.h.javascript.JavaScriptEngine : Caught script exception
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "sameSizeGroup" of undefined (https://img1.wsimg.com/cms/sales/js/ sales-cms-4cebf668dc11f19307efd483ab9e770a.min.js#1)
2018-12-08 16:02:03.794 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.796 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.872 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.874 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.928 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.929 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.292 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.294 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.341 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.367 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.388 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.410 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.718 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.720 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.729 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:09.029 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'PREFIX_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:09.029 WARN 19846 --- [ scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet : Unhandled CSS condition type 'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:09.562 WARN 19846 --- [ scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl : Obsolete content type encountered: 'text/ javascript'.
2018-12-08 16:02:11.017 WARN 19846 --- [ scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl : Obsolete content type encountered: 'text/ javascript'.
2018-12-08 16:02:11.599 WARN 19846 --- [ scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl : Obsolete content type encountered: 'text/ javascript'.
PS。肮脏的解决方案:我已经找到了这个解决方案https://stackoverflow.com/a/14227559/4207348但我认为它很脏......我的这个解决方案的代码示例(加速了我的解析过程 20 倍):
// use this site just for example
final String URL = "https://godaddy.com/";
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
webClient.setWebConnection(new WebConnectionWrapper(webClient) {
@Override
public WebResponse getResponse(final WebRequest request) throws IOException {
if (request.getUrl().toString().contains(URL)) {
return super.getResponse(request);
} else {
return new StringWebResponse("", request.getUrl());
}
}
});
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
//get page
final HtmlPage page = webClient.getPage(URL);
//after line of code above, I have long response
final DomNode articleNode = page.querySelector("body");
final String articleText = articleNode.getTextContent();
} catch (Exception e){
e.printStackTrace();
}
解决方案
您可以像这样禁用 javascript 和 css:
WebClient webClient = new WebClient();
webClient.setCssEnabled(false);
webClient.setJavaScriptEnabled(false);
然后你可以像这样得到纯html:
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.yourpage.com");
String originalHtml = page.getWebResponse().getContentAsString();
推荐阅读
- postgresql - postgresql 中的 TIMESTAMPDIFF 别名
- c# - 使用 Jsonsoft 反序列化期间单行字符串失败
- r - 管道 df 到 geom_hline 参数
- android - 使用 google oauth 登录返回“代码:1”
- azure-devops - Azure DevOps 本地缓存容器作业
- javascript - youtube v3 api 搜索在 Chrome 中失败,但在 Firefox 中有效
- css - CSS - 打印 html 文档
- mysql - 内部连接和 AND 值 IN(查询)上的 MySQL 语法错误
- html - 在 Boostrap 4 中将多个导航选项卡与多个导航栏同步
- excel - SumIF 值 <> 下一列中的“#NV”