首页 > 解决方案 > 如何使用 Selenium 插件执行 Nutch?

问题描述

我正在尝试使用 selenium 插件运行 nutch,但由于我是初学者,因此无法了解如何执行 Nutch 或抓取网站。

根据所需的设置完成 xml 更改:

<property>
    <name>plugin.includes</name>
    <value>protocol-selenium|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
    <description>Regular expression naming plugin directory names to
    include.  Any plugin not matching this expression is excluded.
    In any case you need at least include the nutch-extensionpoints plugin. By
    default Nutch includes crawling just HTML and plain text via HTTP,
    and basic indexing and search plugins. In order to use HTTPS please enable 
    protocol-httpclient, but be aware of possible intermittent problems with the 
    underlying commons-httpclient library.
    </description>

我想用 selenium 执行 nutch 来测试有 javascript 的网页?

标签: seleniumnutch

解决方案


推荐阅读