web-crawler - 如何在 StormCrawler 中将 URL 作为文本文件播种？

我有许多需要使用 StormCrawler 抓取的 URL（大约 40,000 个）。有什么方法可以将这些 URL 作为文本文件而不是crawler.flux中的列表传递？像这样的东西：

spouts:
  - id: "spout"
    className: "com.digitalpebble.stormcrawler.spout.MemorySpout"
    parallelism: 1
    constructorArgs:
      - "URLs.txt"

标签： web-crawlerstormcrawler

对于 Solr 和 Elasticsearch，有一些注入器可以从文件中读取 URL，并将它们作为 DISCOVERED 项添加到状态索引中。当然，需要使用 Solr 或 Elasticsearch 来保存状态索引。注入器作为拓扑启动，例如。

storm ... com.digitalpebble.stormcrawler.elasticsearch.ESSeedInjector .../seeds '*' -conf ...